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NOVEL METHODS OF CONSTRUCTING LIBRARIES OF GENETIC 
PACKAGES THAT COLLECTIVELY DISPLAY THE MEMBERS OF A 
DIVERSE FAMILY OF PEPTIDES, POLYPEPTIDES OR PROTEINS 

The present invention relates to constructing 
5 . libraries of genetic packages that display a member of 
a diverse family of peptides, polypeptides or proteins 
and collectively display at least a portion of the 
diversity of the family. In a preferred embodiment, 
the displayed polypeptides are human Fabs . 

10 More specifically, the invention is directed 

to the methods of cleaving single-stranded nucleic 
acids at chosen locations, the cleaved nucleic acids 
encoding, at least in part, the peptides, polypeptides 
or proteins displayed on the genetic packages of the 

15 libraries of the invention. In a preferred embodiment, 
the genetic packages are filamentous phage or 
phagemids . 

The present invention further relates to 
methods of screening the libraries of genetic packages 
20 that display useful peptides, polypeptides and proteins 
and to the peptides, polypeptides and proteins 
identified by such screening. 
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BACKGROUND OF THE INVENTION 

It is now common practice in the art to 
prepare libraries of genetic packages that display a 
member of a diverse family of peptides, polypeptides or 
5 proteins and collectively display at least a portion of 
the diversity of the family. In many common libraries, 
the displayed peptides, polypeptides or proteins are 
related to antibodies. Often, they are Fabs or single 
chain antibodies. 

10 In general, the DNAs that encode members of 

the families to be displayed must be amplified before 
they are cloned and used to display the desired member 
on the surface of a genetic package. Such 
amplification typically makes use of forward and 

15 backward primers. 

Such primers can be complementary to 
sequences native to the DNA to be amplified or 
complementary to oligonucleotides attached at the 5 1 or 
3 1 ends of that DNA. Primers that are complementary to 

20 sequences native to the DNA to be amplified are 

disadvantaged in that they bias the members of the 
families to be displayed. Only those members that 
contain a sequence in the native DNA that is 
substantially complementary to the primer will be 

25 amplified. Those that do not will be absent from the 
family. For those members that are amplified, any 
diversity within the primer region will be suppressed. 

For example, in European patent 368,684 Bl, 
the primer that is used is at the 5' end of the V H 

30 region of an antibody gene. It anneals to a sequence 
region in the native DNA that is said to be 
"sufficiently well conserved" within a single species. 
Such primer will bias the members amplified to those 
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having this "conserved" region. Any diversity within 
this region is extinguished. 

It is generally accepted that human antibody 
, genes arise through a process that involves a 
5 combinatorial selection of V and J or V, D, and J 

followed by somatic mutations. Although most diversity 
occurs in the Complementary Determining Regions (CDRs), 
diversity also occurs in the more conserved Framework 
Regions (FRs) and at least some of this diversity 

10 confers or enhances specific binding to antigens (Ag) . 
As a consequence, libraries should contain as much of 
the CDR and FR diversity as possible. 

To clone the amplified DNAs for display on a 
genetic package of the peptides, polypeptides or 

15 proteins that they encode, the DNAs must be cleaved to 
produce appropriate ends for ligation to a vector. 
Such cleavage is generally effected using restriction 
endonuclease recognition sites carried on the primers. 
When the primers are at the 5 1 end of DNA produced from 

20 reverse transcription of RNA, such restriction leaves 
deleterious 5 f untranslated regions in the amplified 
DNA. These regions interfere with expression of the 
cloned genes and thus the display of the peptides, 
polypeptides and proteins coded for by them. 

25 SUMMARY OF THE INVENTION 

It is an object of this invention to provide 
novel methods for constructing libraries of genetic 
packages that display a member of a diverse family of 
peptides, polypeptides or proteins and collectively 
30 display at least a portion of the diversity of the 

family. These methods are not biased toward DNAs that 
contain native sequences that are complementary to the 
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primers used for amplification. They also enable any 
sequences that may be deleterious to expression to be 
removed from the amplified DNA before cloning and 
displaying. 

5 It is another object of this invention to 

provide a method for cleaving single-stranded nucleic 
acid sequences at a desired location, the method 
comprising the steps of: 

(i) contacting the nucleic acid with a 
10 single-stranded oligonucleotide, the 

oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired and 
including a sequence that with its complement 
15 in the nucleic acid forms a restriction 

endonuclease recognition site that on 
restriction results in cleavage of the 
nucleic acid at the desired location; and 

(ii) cleaving the nucleic acid solely at 
2 0 the recognition site formed by the 

complementation of the nucleic acid and the 
oligonucleotide; 

the contacting and the cleaving steps being performed . 
at a temperature sufficient to maintain the nucleic 

25 acid in substantially single-stranded form, the 

oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 

30 and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 

It is a further object of this invention to 
provide an alternative method for cleaving single- 
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stranded nucleic acid sequences at a desired location, 
the method comprising the steps of: 

(i) contacting the nucleic acid with a 
partially double-stranded oligonucleotide, 

5 the single-stranded region of the 

oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired, and the 
double-stranded region of the oligonucleotide 
10 having a Type II-S restriction endonuclease 

recognition site, whose cleavage site is 
located at a known distance from the 
recognition site; and 

(ii) cleaving the nucleic acid solely at 
15 the cleavage site formed by the 

complementation of the nucleic acid and the 
single-stranded region of the 
oligonucleotide; 

the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 

It is another objective of the present 
invention to provide a method of capturing DNA 
molecules that comprise a member of a diverse family of 
DNAs and collectively comprise at least a portion of 
the diversity of the family. These DNA molecules in 
single-stranded form have been cleaved by one of the 
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methods of this invention. This method involves 
ligating the individual single-stranded DNA members of 
the family to a partially duplex DNA complex. The 
method comprises the steps of: 
5 (i) contacting a single-stranded nucleic 

acid sequence that has been cleaved with a 
restriction endonuclease with a partially 
double-stranded oligonucleotide, the single- 
stranded region of the oligonucleotide being 
10 functionally complementary to the nucleic 

acid in the region that remains after 
cleavage, the double-stranded region of the 
oligonucleotide including any sequences 
necessary to return the sequences that remain 
15 after cleavage into proper reading frame for 

expression and containing a restriction 
endonuclease recognition site 5' of those 
sequences; and 

(ii) cleaving the partially double- 
20 stranded oligonucleotide sequence solely at 

the restriction endonuclease recognition site 
contained within the double- stranded region 
of the partially double-stranded 
oligonucleotide . 

25 It is another object of this invention to 

prepare libraries, that display a diverse family of 
peptides, polypeptides or proteins and collectively 
display at least part of the diversity of the family, 
using the methods and DNAs described above. 

30 It is an object of this invention to screen 

those libraries to identify useful peptides, 
polypeptides and proteins and to use those substances 
in human therapy. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a schematic of various methods that 
may be employed to amplify VH genes without using 
primers specific for VH sequences. 
5 FIG. 2 is a schematic of various methods that 

may be employed to amplify VL genes without using VL 
sequences . 

FIG. 3 depicts gel analysis of cleaved kappa 
DNA from Example 2 . 
10 FIG. 4 depicts gel analysis of cleaved kappa 

DNA from Example 2 . 

FIG. 5 depicts gel analysis of amplified 
kappa DNA from Example 2. 

FIG. 6 depicts gel purified amplified kappa 
15 DNA from Example 2. 

TERMS 

In this application, the following terms and 
abbreviations are used: 



Sense strand 

20 

Antisense strand 

25 



The upper strand of ds DNA as 
usually written. In the sense 
strand, 5 f -ATG-3' codes for 
Met. 

The lower strand of ds DNA as 
usually written. In the 
antisense strand, S'-TAC-S 1 
would correspond to a Met 
codon in the sense strand. 



WO 01/79481 



PCT/US01/12454 



Forward primer: 

5 

Backward primer: 

10 

15 Bases: 
20 

Sv 
Ap 

25 

RE 
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A "forward" primer is 
complementary to a part of the 
sense strand and primes for 
synthesis of a new antisense- 
strand molecule. "Forward 
primer" and "lower-strand 
primer" are equivalent. 

A "backward" primer is 
complementary to a part of the 
antisense strand and primes 
for synthesis of a new sense- 
strand molecule. "Backward 
primer" and "top-strand 
primer" are equivalent. 

Bases are specified either by 
their position in a vector or 
gene as their position within 
a gene by codon and base. For 
example, "89.1" is the first 
base of codon 89, 89.2 is the 
second base of codon 89. 

Streptavidin 

Ampicillin 

A gene conferring ampicillin 
resistance . 

Restriction endonuclease 
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URE 



Universal restriction 
endonuclease 



Functionally 
complementary 



Two sequences are sufficiently 
complementary so as to anneal 
under the chosen conditions. 



RERS 



Restriction endonuclease 
recognition site 



AA 



Amino acid 



10 PCR 



Polymerization chain reaction 



GLGs 



Germline genes 



Ab 



15 



20 



Antibody: an immunoglobin. 
The term also covers any 
protein haying a binding 
domain which is homologous to 
an immunoglobin binding 
domain. A few examples of 
antibodies within this 
definition are, Inter alia f 
immunoglobin isotypes and the 
Fab, F(ab 1 ) 2 ^ scfv, Fv, dAb and 
Fd fragments. 



25 



Fab 



Two chain molecule comprising 
an Ab light chain and part of 
a heavy-chain. 
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scFv A single-chain Ab comprising 

either VH: : linker : :VL or 
VL: : linker: : VH 

w.t. Wild type 

5 HC Heavy chain 

LC Light chain 

VK A variable domain of a Kappa 

light chain. 

VH A variable domain of a heavy 

10 chain. 

VL A variable domain of a lambda 

light chain. 



In this application, all references referred to are 
specifically incorporated by reference. 

15 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The nucleic acid sequences that are useful in 
the methods of this invention, i.e., those that encode 
at least in part the individual peptides, polypeptides 
and proteins displayed on the genetic packages of this 

20 invention, may be naturally occurring, synthetic or a 
combination thereof. They may be mRNA, DNA or cDNA. 
In the preferred embodiment, the nucleic acids encode 
antibodies. Most preferably, they encode Fabs. 

The nucleic acids useful in this invention 

25 may be naturally diverse, synthetic diversity may be 
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introduced into those naturally diverse members, or the 
diversity may be entirely synthetic. For example, 
synthetic diversity can be introduced into one or more 
CDRs of antibody genes. 
5 Synthetic diversity may be created, for 

example, through the use of TRIM technology (U.S. 
5,869,644). TRIM technology allows control over 
exactly which amino-acid types are allowed at 
variegated positions and in what proportions. In TRIM 

10 technology, codons to be diversified are synthesized 
using mixtures of trinucleotides. This allows any set 
of amino acid types to be included in any proportion. 

Another alternative that may be used to 
generate diversified DNA is mixed oligonucleotide 

15 synthesis. With TRIM technology, one could allow Ala 
and Trp. With mixed oligonucleotide synthesis, a 
mixture that included Ala and Trp would also 
necessarily include Ser and Gly. The amino-acid types 
allowed at the variegated positions are picked with 

20 reference to the structure of antibodies, or other 

peptides, polypeptides or proteins of the family, the 
observed diversity in germline genes, the observed 
somatic mutations frequently observed, and the desired 
areas and types of variegation. 

25 In a preferred embodiment of this invention, 

the nucleic acid sequences for at least one CDR or 
other region of the peptides, polypeptides or proteins 
of the family are cDNAs produced by reverse 
transcription from mRNA. More preferably, the mRNAs 

30 are obtained from peripheral blood cells, bone marrow 
cells, spleen cells or lymph node cells (such as 
B-lymphocytes or plasma cells) that express members of 
naturally diverse sets of related genes. More 
preferable, the mRNAs encode a diverse family of 
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antibodies. Most preferably, the mRNAs are obtained 
from patients suffering from at least one autoimmune 
disorder or cancer. Preferably, mRNAs containing a 
high diversity of autoimmune diseases, such as systemic 
5 lupus erythematosus, systemic sclerosis, rheumatoid 

arthritis, antiphospholipid syndrome and vasculitis are 
used. 

In a preferred embodiment of this invention, 
the cDNAs are produced from the mRNAs using reverse 

10 transcription. In this preferred embodiment, the mRNAs 
are separated from the cell and degraded using standard 
methods, such that only the full length (i.e., capped) 
mRNAs remain. The cap is then removed and reverse 
transcription used to produce the cDNAs. 

15 The reverse transcription of the first 

(antisense) strand can be done in any manner with any 
suitable primer. See, e.g., HJ de Haard et al., 
Journal of Biological Chemistry , 274(26) :18218-30 
(1999) . In the preferred embodiment of this invention 

20 where the mRNAs encode antibodies, primers that are 

complementary to the constant regions of antibody genes 
may be used. Those primers are useful because they do 
not generate bias toward subclasses of antibodies . In 
another embodiment, poly-dT primers may be used (and 

25 may be preferred for the heavy-chain genes) . 

Alternatively, sequences complementary to the primer 
may be attached to the termini of the antisense strand. 

In one preferred embodiment of this 
invention, the reverse transcriptase primer may be 

30 biotinylated, thus allowing the cDNA product to be 

immobilized on streptavidin (Sv) beads. Immobilization 
can also be effected using a primer labeled at the 5 f 
end with one of a) free amine group, b) thiol, c) 
carboxylic acid, or d) another group not found in DNA 
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that can react to form a strong bond to a known partner 
on an insoluble medium. If, for example, a free amine 
(preferably primary amine) is provided at the 5' end of 
. a DNA primer, this amine can be reacted with carboxylic 
5 acid groups on a polymer bead using standard amide- 
forming chemistry. If such preferred immobilization is 
used during reverse transcription, the top strand RNA 
is degraded using well-known enzymes, such as a 
combination of RNAseH and RNAseA, either before or 

10 after immobilization. 

The nucleic acid sequences useful in the 
methods of this invention are generally amplified 
before being used to display the peptides, polypeptides 
or proteins that they encode. Prior to amplification, 

15 the single-stranded DNAs may be cleaved using either of 
the methods described before. Alternatively, the 
single-stranded DNAs may be amplified and then cleaved 
using one of those methods. 

Any of the well known methods for amplifying 

20 nucleic acid sequences may be used for such 

amplification. Methods that maximize, and do not bias, 
diversity are preferred. In a preferred embodiment of 
this invention where the nucleic acid sequences are 
derived from antibody genes, the present invention 

25 preferably utilizes primers in the constant regions of 
the heavy and light chain genes and primers to a 
synthetic sequence that are attached at the 5 f end of 
the sense strand. Priming at such synthetic sequence 
avoids the use of sequences within the variable regions 

30 of the antibody genes. Those variable region priming 
sites generate bias against V genes that are either of 
rare subclasses or that have been mutated at the 
priming sites. This bias is partly due to suppression 
of diversity within the primer region and partly due to 
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lack of priming when many mutations are present in the 
region complementary to the primer. The methods 
disclosed in this invention have the advantage of not 
biasing the population of amplified antibody genes for 
5 particular V gene types. 

The synthetic sequences may be attached to 
the 5' end of the DNA strand by various methods well 
known for ligating DNA sequences together. RT 
CapExtention is one preferred method. 

10 In RT CapExtention (derived from Smart 

PCR (TM) )/ a short overlap (5 1 - . . .GGG-3 ' in the upper- 
strand primer (USP-GGG) complements 3 , -CCC....5 f in the 
lower strand) and reverse transcriptases are used so 
that the reverse complement of the upper-strand primer 

15 is attached to the lower strand. 

In a preferred embodiment of this invention 
the upper strand or lower strand primer may be also 
biotinylated or labeled at the 5 1 end with one of a) 
free amino group, b) thiol, c) carboxylic acid and d) 

20 another group not found in DNA that can react to form a 
strong bond to a known partner as an insoluble medium. 
These can then be used to immobilize the labeled strand 
after amplification. The immobilized DNA can be either 
single or double-stranded. 

25 FIG. 1 shows a schematic of the amplification 

of VH genes. FIG. 1, Panel A shows a primer specific 
to the poly-dT region of the 3 f UTR priming synthesis 
of the first, lower strand. Primers that bind in the 
constant region are also suitable. Panel B shows the 

30 lower strand extended at its 3 f end by three Cs that 
are not complementary to the mRNA. Panel C shows the 
result of annealing a synthetic top-strand primer 
ending in three GGGs that hybridize to the 3 f terminal 
CCCs and extending the reverse transcription extending 
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the lower strand by the reverse complement of the 
synthetic primer sequence. Panel D shows the result of 
PCR amplification using a 5' biotinylated synthetic 
top-strand primer that replicates the 5 f end of the 
5 synthetic primer of panel C and a bottom-strand primer 
complementary to part of the constant domain. Panel E 
shows immobilized double-stranded (ds) cDNA obtained by 
using a 5 T -biotinylated top-strand primer. 

FIG. 2 shows a similar schematic for 

10 amplification of VL genes. FIG. 2, Panel A shows a 
primer specific to the constant region at or near the 
3' end priming synthesis of the first, lower strand. 
Primers that bind in the poly-dT region are also 
suitable. Panel B shows the lower strand extended at 

15 its 3 T end by three Cs that are not complementary to 
the mRNA. Panel C shows the result of annealing a 
synthetic top-strand primer ending in three GGGs that 
hybridize to the 3 f terminal CCCs and extending the 
reverse transcription extending the lower strand by the 

20 reverse complement of the synthetic primer sequence. 
Panel D shows the result of PCR amplification using a 
5' biotinylated synthetic top-strand primer that 
replicates the 5 1 end of the synthetic primer of panel 
C and a bottom-strand primer complementary to part of 

25 the constant domain. The bottom-strand primer also 

contains a useful restriction endonuclease site, such 
as Ascl. Panel E shows immobilized ds cDNA obtained by 
using a 5 1 -biotinylated top-strand primer. 

In FIGs. 1 and 2, each V gene consists of a 

30 5 1 untranslated region (UTR) and a secretion signal, 

followed by the variable region, followed by a constant 
region, followed by a 3' untranslated region (which 
typically ends in poly-A) . An initial primer for 
reverse transcription may be complementary to the 
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constant region or to the poly A segment of the 3 f -UTR. 
For human heavy-chain genes, a primer of 15 T is 
preferred. Reverse transcriptases attach several C 
residues to the 3 r end of the newly synthesized DNA. 
5 RT CapExtention exploits this feature. The reverse 

transcription reaction is first run with only a lower- 
strand primer. After about 1 hour, a primer ending in 
GGG (USP-GGG) and more RTase are added. This causes 
the lower-strand cDNA to be extended by the reverse 

10' complement of the USP-GGG up to the final GGG. Using 
one primer identical to part of the attached synthetic 
sequence and a second primer complementary to a region 
of known sequence at the 3' end of the sense strand, 
all the V genes are amplified irrespective of their V 

15 gene subclass. 

After amplification, the DNAs of this 
invention are rendered single-stranded. For example, 
the strands can be separated by using a biotinylated 
primer, capturing the biotinylated product on 

20 streptavidin beads, denaturing the DNA, and washing 

away the complementary strand. Depending on which end 
of the captured DNA is wanted, one will choose to 
immobilize either the upper (sense) strand or the lower 
(antisense) strand. 

25 To prepare the single-stranded amplified DNAs 

for cloning into genetic packages so as to effect 
display of the peptides, polypeptides or proteins 
encoded, at least in part, by those DNAs, they must be 
manipulated to provide ends suitable for cloning and 

30 expression. In particular, any 5 f untranslated regions 
and mammalian signal sequences must be removed and 
replaced, in frame, by a suitable signal sequence that 
functions in the display host. Additionally, parts of 
the variable domains (in antibody genes) may be removed 
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and replaced by synthetic segments containing synthetic 
diversity. The diversity of other gene families may 
likewise be expanded with synthetic diversity. 

According to the methods of this invention, 
5 there are two ways to manipulate the single-stranded 
amplified DNAs for cloning. The first method comprises 
the steps of: 

(i) contacting the nucleic acid with a 
single-stranded oligonucleotide, the 

10 oligonucleotide being functionally 

complementary to the nucleic acid in the 
region in which cleavage is desired and 
including a sequence that with its complement 
in the nucleic acid forms a restriction 

15 endonuclease recognition site that on 

restriction results in cleavage of the 
nucleic acid at the desired location; and 

(ii) cleaving the nucleic acid solely at 
the recognition site formed by the 

20 complementation of the nucleic acid and the 

oligonucleotide ; 

the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 

25 oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 

30 endonuclease that is active at the chosen temperature. 

In this first method, short oligonucleotides 
are annealed to the single-stranded DNA so that 
restriction endonuclease recognition sites formed 
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within the now locally double-stranded regions of the 
DNA can be cleaved- In particular, a recognition site 
that occurs at the same position in a substantial 
-fraction of the single-stranded DNAs is identical. 
5 For antibody genes, this can be done using a 

catalog of germline sequences. See, e.g./ 
"http : / /www . mrc-cpe . cam . ac . uk/ imt-doc/ restricted/ok . htm 
1." Updates can be obtained from this site under the 
heading "Amino acid and nucleotide sequence 

10 alignments." For other families, similar comparisons 
exist and may be used to select appropriate regions for 
cleavage and to maintain diversity. 

For example, Table 195 depicts the DNA 
sequences of the FR3 regions of the 51 known human VH 

15 germline genes. In this region, the genes contain 
restriction endonuclease recognition sites shown in 
Table 200. Restriction endonucleases that cleave a 
large fraction of germline genes at the same site are 
preferred over endonucleases that cut at a variety of 

20 sites. Furthermore, it is preferred that there be only 
one site for the restriction endonucleases within the 
region to which the short oligonucleotide binds on the 
single-stranded DNA, e.g., about 10 bases on either 
side of the restriction endonuclease recognition site. 

25 An enzyme that cleaves downstream in FR3 is 

also more preferable because it captures fewer 
mutations in the framework. This may be advantageous 
is some cases. However, it is well known that 
framework mutations exist and confer and enhance 

30 antibody binding. The present invention, by choice of 
appropriate restriction site, allows all or part of FR3 
diversity to be captured. Hence, the method also 
allows extensive diversity to be captured. 
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Finally, in the methods of this invention 
restriction endonucleases that are active between about 
45° and about 75°C are used. Preferably enzymes that 
are active above 50 °C, and more preferably active about 
5 55°C/ are used. Such temperatures maintain the nucleic 
acid sequence to be cleaved in substantially single- 
stranded form. 

Enzymes shown in Table 200 that cut many of 
the heavy chain FR3 germline genes at a single position 

10 include: Maelll (2404) , Tsp45I (2104 ) , Jfphl(44@5), 

BsaJI (23@65) , Alul (23047) , BlpI (21048), Ddel (29058), 
Bglll (10061) , Mall (44072) , BslEI (23074 ) , Eael (23074) , 
EagI (23074), tfaelll (25075) , BstACI (51086) , 
HpyCH4III (51086) , tfinfl(3802), Mlyl(1802), PIeI(1802), 

15 Mnll (31067) , ifpyCH4V(21044) , BsmAI (16011) , Bpml ( 19012 ) , 
XmnI (12030) , and Sad (11051). (The notation used 
means, for example , that BsmAI cuts 16 of the FR3 
germline genes with a restriction endonuclease 
recognition site beginning at base 11 of FR3 . ) 

20 For cleavage of human heavy chains in FR3, 

the preferred restriction endonucleases are: Bst4CI (or 
Taal or HpyCH4III) f BlpI, HpyCH4V, and MslI. Because 
ACNGT (the restriction endonuclease recognition site 
for -Bst4CI, Taal, and J?pyCH4III) is found at a 

25 consistent site in all the human FR3 germline genes, 

one of those enzymes is the most preferred for capture 
of heavy chain CDR3 diversity. BlpI and HpyCH4V are 
complementary. BlpI cuts most members of the VH1 and 
VH4 families while HpyCH4V cuts most members of the 

30 VH3, VH5, VH6, and VH7 families. Neither enzyme cuts 
VH2s, but this is a very small family, containing only 
three members. Thus, these enzymes may also be used in 
preferred embodiments of the methods of this invention. 
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The restriction endoniicleases ifpyCH4III, 
Bst4CI, and Taal all recognize 5'-ACnGT-3' and cut 
upper strand DNA after n and lower strand DNA before 
the base complementary to n. This is the most 
5 preferred restriction endonuclease recognition site for 
tHis method on human heavy chains because it is found 
in all germline genes. Furthermore, the restriction 
endonuclease recognition region (ACnGT) matches the 
second and third bases of a tyrosine codon (tav) and 

10 the following cysteine codon (tav) as shown in Table 

206. These codons are highly conserved, especially the 
cysteine in mature antibody genes. 

Table 250 E shows the distinct 
oligonucleotides of length 22 (except the last one 

15 which is of length 20) bases. Table 255 C shows, the 

analysis of 1617 actual heavy chain antibody genes. Of 
these, 1511 have the site and match one of the 
candidate oligonucleotides to within 4 mismatches. 
Eight oligonucleotides account for most of the matches 

20 and are given in Table 250 F.l. The 8 oligonucleotides 
are very similar so that it is likely that satisfactory 
cleavage will be achieved with only one oligonucleotide 
(such as H43 . 77. 97 . 1-02#1) by adjusting temperature, 
pH, salinity, and the like. One or two 

25 oligonucleotides may likewise suffice whenever the 
germline gene sequences differ very little and 
especially if they differ very little close to the 
restriction endonuclease recognition region to be 
cleaved. Table 255 D shows a repeat analysis of 1617 

30 actual heavy chain antibody genes using only the 8 

chosen oligonucleotides. This shows that 1463 of the 
sequences match at least one of the oligonucleotides to 
within 4 mismatches and have the site as expected. 
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Only 7 sequences have a second BpyCH4III restriction 
endonuclease recognition region in this region. 

Another illustration of choosing an 
appropriate restriction endonuclease recognition site 
5 involves cleavage in FR1 of human heavy chains . 
Cleavage in FR1 allows capture of the entire CDR 
diversity of the heavy chain. 

The germline genes for human heavy chain FR1 
are shown in Table 217. Table 220 shows the 
10 restriction endonuclease recognition sites found in 
human germline genes FRls. The preferred sites are 
Bsgl (GTGCAG; 3904) , BsoFI (GCngc; 4306, 1109, 203, 1012) , 
Tsel (Gcwgc;4306, 1109,203, 1012) , 

MspAlI (CMGckg; 4607, 201) , PvuII (CAGctg; 4607, 201) , 

15 Alul (AGct; 4808202) , Ddel (Ctnag; 22052 , 9048 ) , 

HphI (tcacc; 22080) , BssKI (Nccngg; 35039, 2040) , 
BsaJI (Ccnngg; 32040, 2041) , BstNI (CCwgg; 33040) , 
ScrFI (CCngg; 35040, 2041) , £co0109I (RGgnccy; 2204 6, 
11043) , Sau96I (Ggncc; 23047, 11044) , 

20 Avail (Ggwcc; 23047, 4044) , PpuMI (RGgwccy; 22046, 4043) , 
BsmFI (gtccc; 20048) , tfinfl (Gantc; 34016, 21056, 21077) , 
TFil (21877) , MZyl (GAGTC; 34016) , Ml yl (gactc; 21056) , and 
Al wNI (CAGnnnctg; 22068) . The more preferred sites are 
MspAI and PvuII. MspAI and PvuII have 46 sites at 7-12 

25 and 2 at 1-6. To avoid cleavage at both sites, 

oligonucleotides are used that do not fully cover the 
site at 1-6. Thus, the DNA will not be cleaved at that 
site. We have shown that DNA that extends 3, 4, or 5 
bases beyond a PvuII-site can be cleaved efficiently. 

30 Another illustration of choosing an 

appropriate restriction endonuclease recognition site 
involves cleavage in FR1 of human kappa light chains. 
Table 300 shows the human kappa FR1 germline genes and 
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Table 302 shows restriction endonuclease recognition 
sites that are found in a substantial number of human 
kappa FR1 germline genes at consistent locations. Of 
the restriction endonuclease recognition sites listed, 
5 BswAI and PflFl are the most preferred enzymes. Bsmhl 
sites are found at base 18 in 35 of 40 germline genes. 
PflFl sites are found in 35 of 40 germline genes at 
base 12. 

Another example of choosing an appropriate 

10 restriction endonuclease recognition site involves 

cleavage in FR1 of the human lambda light chain. Table 
400 shows the 31 known human lambda FR1 germline gene 
sequences. Table 405 shows restriction endonuclease 
recognition sites found in human lambda FR1 germline 

15 genes. Hinfl and Ddel are the most preferred 

restriction endonucleases .for cutting human lambda 
chains in FR1 . 

After the appropriate site or sites for 
cleavage are chosen, one or more short oligonucleotides 

20 are prepared so as to functionally complement, alone or 
in combination, the chosen recognition site. The 
oligonucleotides also include sequences that flank the 
recognition site in the majority of the amplified 
genes. This flanking region allows the sequence to 

25 anneal to the single-stranded DNA sufficiently to allow 
cleavage by the restriction endonuclease specific for 
the site chosen. 

The actual length and sequence of the 
oligonucleotide depends on the recognition site and the 

30 conditions to be used for contacting and cleavage. The 
length must be sufficient so that the oligonucleotide 
is functionally complementary to the single-stranded 
DNA over a large enough region to allow the two strands 
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to associate such that cleavage may occur at the chosen 
temperature and solely at the desired location. 

Typically/ the oligonucleotides of this 
preferred method of the invention are about 17 to about 
5 30 nucleotides in length. Below about 17 bases, 

annealing is too weak and above 30 bases there can be a 
loss of specificity* A preferred length is 18 to 24 
bases . 

Oligonucleotides of this length need not be 
10 identical complements of the germline genes. Rather, a 
few mismatches taken may be tolerated. Preferably, 
however, no more than 1-3 mismatches are allowed. Such 
mismatches do not adversely affect annealing of the 
oligonucleotide to the single-stranded DNA. Hence, the 
15 two DNAs are said to be functionally complementary. 

The second method to manipulate the amplified 
single-stranded DNAs of this invention for cloning 
comprises the steps of: 

(i) contacting the nucleic acid with a 
20 partially double- stranded oligonucleotide, 

the single-stranded region of the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired, and the 
2 5 double-stranded region of the oligonucleotide 

having a Type II-S restriction endonuclease 
recognition site, whose cleavage site is 
located at a known distance from the 
recognition site; and 
30 (ii) cleaving the nucleic acid solely at 

the cleavage site formed by the 
complementation of the nucleic acid and the 
single-stranded region of the 
oligonucleotide; 
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the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
5 nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 

10 This second method employs Universal 

Restriction Endonucleases ("URE") . UREs are partially 
double-stranded oligonucleotides. The single-stranded 
portion or overlap of the URE consists of a DNA adapter 
that is functionally complementary to the sequence to 

15 be cleaved in the single-stranded DNA. The double- 
stranded portion consists of a type II-S restriction 
endonuclease recognition site. 

The URE method of this invention is specific 
and precise and can tolerate some (e.g., 1-3) 

20 mismatches in the complementary regions, i.e., it is 
functionally complementary to that region. Further, 
conditions under which the URE is used can be adjusted 
so that most of the genes that are amplified can be 
cut, reducing bias in the library produced from those 

25 genes. 

The sequence of the single-stranded DNA 
adapter or overlap portion of the URE typically 
consists of about 14-22 bases. However, longer or 
shorter adapters may be used. The size depends on the 
30 ability of the adapter to associate with its functional 
. complement in the single-stranded DNA and the 
temperature used for contacting the URE and the single- 
stranded DNA at the temperature used for cleaving the 
DNA with the type II-S enzyme. The adapter must be 
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functionally complementary to the single-stranded DNA 
over a large enough region to allow the two strands to 
associate such that the cleavage may occur at the 
chosen temperature and at the desired location. We 
5 prefer singe-stranded or overlap portions of 14-17 
bases in length, and more preferably 18-20 bases in 
length. 

The site chosen for cleavage using the URE is 
preferably one that is substantially conserved in the 

10 family of amplified DNAs. As compared to the first 
cleavage method of this invention, these sites do not 
need to be endonuclease recognition sites. However, 
like the first method, the sites chosen can be 
synthetic rather than existing in the native DNA. Such 

15 sites may be chosen by references to the sequences of 
known antibodies or other families of genes. For 
example, the sequences of many germline genes are 
reported at ht t~p : / /www . mrc-cpe . cam . ac . uk/ imt- 
doc/restricted/ok. html . For example, one preferred 

20 site occurs near the end of FR3 — codon 89 through the 
second base of codon 93. CDR3 begins at codon 95. 

The sequences of 79 human heavy-chain genes 
are also available at 

htto; //www.ncbi .nlm. nih.gov/entre2/nucleotide. html . 

25 This site can be used to identify appropriate sequences 
for URE cleavage according to the methods of this 
invention. See, e.g., Table 8B. 

Most preferably, one or more sequences are 
identified using these sites or other available 

30 sequence information. These sequences together are 
present in a substantial fraction of the amplified 
DNAs. For example, multiple sequences could be used to 
allow for known diversity in germline genes or for 
frequent somatic mutations. Synthetic degenerate 
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sequences could also be used. Preferably, a 
sequence (s) that occurs in at least 65% of genes 
examined with no more than 2-3 mismatches is chosen 

URE single-stranded adapters or overlaps are 
5 then made to be complementary to the chosen regions. 
Conditions for using the UREs are determined 
empirically. These conditions should allow cleavage of 
DNA that contains the functionally complementary 
sequences with no more than 2 or 3 mismatches but that 

10 do not allow cleavage of DNA lacking such sequences. 

As described above, the double-stranded 
portion of the URE includes a Type II-S endonuclease 
recognition site. Any Type II-S enzyme that is active 
at a temperature necessary to maintain the single- 

15 stranded DNA substantially in that form and to allow 
the single-stranded DNA adapter portion of the URE to 
anneal long enough to the single-stranded DNA to permit 
cleavage at the desired site may be used. 

The preferred Type II-S enzymes for use in 

20 the URE methods of this invention provide asymmetrical 
cleavage of the single-stranded DNA. Among these are 
the enzymes listed in Table 800. The most preferred 
Type II-S enzyme is Fokl . 

When the preferred Fok I containing URE is 

25 used, several conditions are preferably used to effect 
cleavage: 

1) Excess of the URE over target DNA should be 
present to activate the enzyme. URE present 
only in equimolar amounts to the target DNA 

30 would yield poor cleavage of ssDNA because 

the amount of active enzyme available would 
be limiting. 

2) An activator may be used to activate part of 
the FoJcI enzyme to dimerize without causing 
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cleavage. Examples of appropriate activators 
are shown in Table 510. 
3) The cleavage reaction is performed at a 
temperature between 45°-75°C, preferably 
5 above 50°C and most preferably above 55°C. 

The UREs used in the prior art contained a 
14-base single-stranded segment, a 10-base stem 
(containing a Fokl site), followed by the palindrome of 
the 10-base stem. While such UREs may be used in the 

10 methods of this invention, the preferred UREs of this 
invention also include a segment of three to eight 
bases (a loop) between the Fokl restriction 
endonuclease recognition site containing segments. In 
the preferred embodiment, the stem (containing the Fokl 

15 site) and its palindrome are also longer than 10 bases. 
Preferably, they are 10-14 bases in length. Examples 
of these "lollipop" URE adapters are shown in Table 5. 

One example of using a URE to cleave an 
single-stranded DNA involves the FR3 region of human 

20 heavy chain. Table 508 shows an analysis of 840 full- 
length mature human heavy chains with the URE 
recognition sequences shown. The vast majority 
(718/840=0.85) will be recognized with 2 or fewer 
mismatches using five UREs (VHS881-1 . 1, VHS881-1.2, 

25 VHS881-2.1, VHS881-4.1, and VHS881-9.1). Each has a 
20-base adaptor sequence to complement the germline 
gene, a ten-base stem segment containing a Fokl site, a 
five base loop, and the reverse complement of the first 
stem segment. Annealing those adapters, alone or in 

30 combination, to single-stranded antisense heavy chain 
DNA and treating with Fokl in the presence of, e.g., 
the activator FOKIact, will lead to cleavage of the 
antisense strand at the position indicated. 
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Another example of using a URE(s) to cleave a 
single-stranded DNA involves the FR1 region of the 
human Kappa light chains. Table 512 shpws an analysis 
of 182 full-length human kappa chains for matching by 
5 the four 19-base probe sequences shown. Ninety-six 

percent of the sequences match one of the probes with 2 
or fewer mismatches. The URE adapters shown in Table 
512 are for cleavage of the sense strand of kappa 
chains. Thus, the adaptor sequences are the reverse 

10 complement of the germline gene sequences. The URE 
consists of a ten-base stem, a five base loop, the 
reverse complement of the stem and the complementation 
sequence. The loop shown here is TTGTT, but other 
sequences could be used. Its function is to interrupt 

15 the palindrome of the stems so that formation of a 

lollypop monomer is favored over dimerization. Table 
512 also shows where the sense strand is cleaved. 

Another example of using a URE to cleave a 
single-stranded DNA involves the human lambda light 

20 chain. Table 515 shows analysis of 128 human lambda 
light chains for matching the four 19-base probes 
shown. With three or fewer mismatches, 88 of 128 (69%) 
of the chains match one of the probes. Table 515 also 
shows URE adapters corresponding to these probes. 

25 Annealing these adapters to upper-strand ssDNA of 

lambda chains and treatment with FoJcI in the presence 
of FOKIact at a temperature at or above 45°C will lead 
to specific and precise cleavage of the chains. 

The conditions under which the short 

30 oligonucleotide sequences of the first method and the 
UREs of the second method are contacted with the 
single-stranded DNAs may be empirically determined. 
The conditions must be such that the single-stranded 
DNA remains in substantially single-stranded form. 
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More particularly, the conditions must be such that the 
single-stranded DNA does not form loops that may 
interfere with its association with the oligonucleotide 
sequence or the URE or that may themselves provide 
5 sites for cleavage by the chosen restriction 
endonuc lease . 

The effectiveness and specificity of short 
oligonucleotides (first method) and UREs (second 
method) can be adjusted by controlling the 

10 concentrations of the URE adapters/oligonucleotides and 
substrate DNA, the temperature, the pH, the 
concentration of metal ions, the ionic strength, the 
concentration of chaotropes (such as urea and 
formamide) , the concentration of the restriction 

15 endonuclease (e. g. , FoicI), and the time of the 

digestion. These conditions can be optimized with 
synthetic oligonucleotides having: 1) target germline 
gene sequences, 2) mutated target gene sequences, or 3) 
somewhat related non-target sequences. The goal is to 

20 cleave most of the target sequences and minimal amounts 
of non-targets. 

In the preferred embodiment of this 
invention, the single-stranded DNA is maintained in 
substantially that form using a temperature between 

25 45°C to 75°C. More preferably, a temperature between 
50°C and 60°C, most preferably between 55°C and 60°C, 
is used. These temperatures are employed both when 
contacting the DNA with the oligonucleotide or URE and 
when cleaving the DNA using the methods of this 

30 invention. 

The two cleavage methods of this invention 
have several advantages. The first method allows the 
individual members of the family of single-stranded 
DNAs to be cleaved solely at one substantially 
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conserved endonuclease recognition site. The method 
also does not require an endonuclease recognition site 
to be built in to the reverse transcription or 
amplification primers. Any native or synthetic site in 
5 the family can be used. 

The second method has both of these 
advantages. In addition, the URE method allows the 
single-stranded DNAs to be cleaved at positions where 
no endonuclease recognition site naturally occurs or 

10 has been synthetically constructed. 

Most importantly, both cleavage methods 
permit the use of 5' and 3* primers so as to maximize 
diversity and then cleavage to remove unwanted or 
deleterious sequences before cloning and display. 

15 After cleavage of the amplified DNAs using 

one of the methods of this invention, the DNA is 
prepared for cloning. This is done by using a 
partially duplexed synthetic DNA adapter, whose 
terminal sequence is based on the specific cleavage 

2 0 site at which the amplified DNA has been cleaved. 

The synthetic DNA is designed such that when 
it is ligated to the cleaved single-stranded DNA, it 
allows that DNA to be expressed in the correct reading 
frame so as to display the desired peptide, polypeptide 

25 or protein on the surface of the genetic package. 

Preferably, the double-stranded portion of the adapter 
comprises the sequence of several codons that encode 
the amino acid sequence characteristic of the family of 
peptides, polypeptides or proteins up to the cleavage 

30 site. For human heavy chains, the amino acids of the 
3-23 framework are preferably used to provide the 
sequences required for expression of the cleaved DNA. 

Preferably, the double-stranded portion of 
the adapter is about 12 to 100 bases in length. More 



WO 01/79481 



PCT/US01/12454 



- 31 - 

preferably, about 20 to 100 bases are used. The 
double-standard region of the adapter also preferably 
contains at least one endonuclease recognition site 
useful for cloning the DNA into a suitable display 
5 vector (or a recipient vector used to archive the 

diversity) . This endonuclease restriction site may be 
native to the germline gene sequences used to extend 
the DNA sequence. It may be also constructed using 
degenerate sequences to the native germline gene 

10 sequences. Or, it may be wholly synthetic. 

The single-stranded portion of the adapter is 
complementary to the region of the cleavage in the 
single-stranded DNA. The overlap can be from about 2 
bases up to about 15 bases. The longer the overlap, 

15 the more efficient the ligation is likely to be. A 
preferred length for the overlap is 7 to 10. This 
allows some mismatches in the region so that diversity 
in this region may be captured. 

The single-stranded region or overlap of the 

20 partially duplexed adapter is advantageous because it 
allows DNA cleaved at the chosen site, but not other 
fragments to be captured. Such fragments would 
contaminate the library with genes encoding sequences 
that will not fold into proper antibodies and are 

25 likely to be non-specif ically sticky. 

One illustration of the use of a partially 
duplexed adaptor "in the methods of this invention 
involves ligating such adaptor to a human FR3 region 
that has been cleaved, as described above, at 5 1 -ACnGT- 

30 3' using HpyCH4III, Bst4CI or Taal. 

Table 250 F.2 shows the bottom strand of the 
double-stranded portion of the adaptor for ligation to 
the cleaved bottom-strand DNA. Since the HpyCH4III- 
Site is so far to the right (as shown in Table 206), a 
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sequence that includes the Aflll-site as well as the 
Xbal site can be added. This bottom strand portion of 
the partially-duplexed adaptor, H43.XAExt, 
incorporates both Xbal and Aflll-sites . The top strand 
5 of the double-stranded portion of the adaptor has 

neither site (due to planned mismatches in the segments 
opposite the Xbal and Afill-Sites of H43.XAExt), but 
will anneal very tightly to H43.XAExt. H43AExt 
contains only the Afill-site and is to be used with the 

10 top strands H43.ABrl and H43.ABr2 (which have 

intentional alterations to destroy the AjfJII-site) . 

After ligation, the desired, captured DNA can 
be PCR amplified again, if desired, using in the 
preferred embodiment a primer to the downstream 

15 constant region of the antibody gene and a primer to 

part of the double-standard region of the adapter. The 
primers may also carry restriction endonuclease sites 
for use in cloning the amplified DNA. 

After ligation, and perhaps amplification, of 

20 the partially double-stranded adapter to the single- 
stranded amplified DNA, the composite DNA is cleaved at 
chosen 5 1 and 3' endonuclease recognition sites. 

The cleavage sites useful for cloning depend 
on the phage or phagemid into which the cassette will 

25 be inserted and the available sites in the antibody 

genes. Table 1 provides restriction endonuclease data 
for 75 human light chains. Table 2 shows corresponding 
data for 79 human heavy chains. In each Table, the 
endonucleases are ordered by increasing frequency of 

30 cutting. In these Tables, Nch is the number of chains 
cut by the enzyme and Ns is the number of sites (some 
chains have more than one site) . 
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From this analysis, Sfll, NotI, Aflll, ApaLI,. 
and AscI are very suitable. Sfil and NotI are 
preferably used in pCESl to insert the heavy-chain 
display segment. ApaLI and AscI are preferably used in 
5 pCESl to insert the light-chain display segment. 

BstEII-sites occur in 97% of germ-line JH 
genes. In rearranged V genes, only 54/79 (68%) of 
heavy-chain genes contain a BstEII-Site and 7/61 of 
these contain two sites. Thus, 47/79 (59%) contain a 
10 single BstEII-Site. An alternative to using BstEII is 
to cleave via UREs at the end of JH and ligate to a 
synthetic oligonucleotide that encodes part of CHI. 

One example of preparing a family of DNA 
sequences using the methods of this invention involves 

15 capturing human CDR 3 diversity. As described above, 
mRNAs from various autoimmune patients is reverse 
transcribed into lower strand cDNA. After the top 
strand RNA is degraded, the lower strand is immobilized 
and a short oligonucleotide used to cleave the cDNA 

2 0 upstream of CDR3. A partially duplexed synthetic DNA 
adapter is then annealed to the DNA and the DNA is 
amplified using a primer to the adapter and a primer to 
the constant region (after FR4) . The DNA is then 
cleaved using BstEII (in FR4) and a restriction 

25 endonuclease appropriate to the partially double- 
stranded adapter (e.g., Xba I and Aflll (in FR3) ) . The 
DNA is then ligated into a synthetic VH skeleton such 
as 3-23. 

One example of preparing a single-stranded 
30 DNA that was cleaved using the URE method involves the 
human Kappa chain. The cleavage site in the sense 
strand of this chain is depicted in Table 512. The 
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oligonucleotide kapextURE is annealed to the 
oligonucleotides (kaBROlUR, kaBR02UR, kaBR03UR, and 
kaBR04UR) to form a partially duplex DNA. This DNA is 
then ligated to the cleaved soluble kappa chains. The 
5 ligation product is then amplified using primers 

kapextUREPCR and CKForeAsc (which inserts a AscI site 
after the end of C kappa) . This product is then 
cleaved with ^paLI and AscI and ligated to similarly 
cut recipient vector. 

10 Another example involves the cleavage 

illustrated in Table 515. After cleavage, an extender 
(ONjLamExi33) and four bridge oligonucleotides (ONjLamBi- 
133, ON_LamB2-133, ON_LamB3-133, and ON_LamB4-133) are 
annealed to form a partially duplex DNA. That DNA is 

15 ligated to the cleaved lambda-chain sense strands. 

After ligation, the DNA is amplified with ON_Lami33PCR 
and a forward primer specific to the lambda constant 
domain, such as CL2ForeAsc or CL7ForeAsc (Table 130) . 

In human heavy chains, one can cleave almost 

20 all genes in FR4 (downstream, i.e. toward the 3 1 end of 
the sense strand, of CDR3) at a BstEII-Site that occurs 
at a constant position in a very large fraction of 
human heavy-chain V genes. One then needs a site in 
FR3, if only CDR3 diversity is to be captured, in FR2, 

25 if CDR2 and CDR3 diversity is wanted, or in FR1, if all 
the CDR diversity is wanted. These sites are 
preferably inserted as part of the partially double- 
stranded adaptor. 

The preferred process of this invention is to 

30 provide recipient vectors having sites that allow 

cloning of either light or heavy chains. Such vectors 
are well known and widely used in the art. A preferred 
phage display vector in accordance with this invention 
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is phage MALI A3 . This displays in gene III. The 
sequence of the phage MALI A3 is shown in Table 120A 
(annotated) and Table 12 OB (condensed) . 

The DNA encoding the selected regions of the 
5 light or heavy chains can be transferred to the vectors 
using endonucleases that cut either light or heavy 
chains only very rarely. For example, light chains may 
be captured with ApaLI and AscI . Heavy-chain genes are 
preferably cloned into a recipient vector having Sfll, 

10 Ncol, Xbal, Aflll, BstEII, Apal, and NotI sites. The 
light chains are preferably moved into the library as 
ApaLI-AscI fragments. The heavy chains are preferably 
moved into the library as sril-Notl fragments. 

Most preferably, the display is had on the 

15 surface of a derivative of M13 phage. The most 

preferred vector contains all the genes of M13, an 
antibiotic resistance gene, and the display cassette. 
The preferred vector is provided with restriction sites 
that allow introduction and excision of members of the 

20 diverse family of genes, as cassettes. The preferred 
vector is stable against rearrangement under the growth 
conditions used to amplify phage. 

In another embodiment of this invention, the 
diversity captured by the methods of the present 

25 invention may be displayed in a phagemid vector (e.g., 
pCESl) that displays the peptide, polypeptide or 
protein on the III protein. Such vectors may also be 
used to store the diversity for subsequent display 
using other vectors or phage. 

30 In another embodiment, the mode of display 

may be through a short linker to three possible anchor 
domains. One anchor domain being the final portion of 
M13 III ( "Illstump") , a second anchor being the full 
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length III mature protein, and the third being the M13 
VIII mature protein. 

The Illstump fragment contains enough of Ml 3 
III to assemble into phage but not the domains involved 
5 in mediating infectivity. Because the w.t. Ill and 
VIII proteins are present, the phage is unlikely to 
delete the antibody genes and phage that do delete 
these segments receive only a very small growth 
advantage. For each of the anchor domains, the DNA 
10 encodes the w.t. AA sequence, but differs from the w.t. 
DNA sequence to a very high extent. This will greatly 
reduce the potential for homologous recombination 
between the display anchor and the w.t. gene that is 
also present. 

15 Most preferably, the present invention uses a 

complete phage carrying an antibiotic-resistance gene 
(such as an ampicillin-resistance gene) and the display 
cassette. Because the w.t. iii and viii genes are 
present, the w.t. proteins are also present. The 

20 display cassette is transcribed from a regulatable 

promoter (e.g., P La c2) • Use of a regulatable promoter 
allows control of the ratio of the fusion display gene 
to the corresponding w.t. coat protein. This ratio 
determines the average number of copies of the display 

25 fusion per phage (or phagemid) particle. 

Another aspect of the invention is a method 
of displaying peptides, polypeptides or proteins (and 
particularly Fabs) on filamentous phage. In the most 
preferred embodiment this method displays FABs and 

30 comprises: 

a) obtaining a cassette capturing a diversity of 

segments of DNA encoding the elements: 

P reg : :RBS1: :SS1: :VL: :CL: :stop: :RBS2: :SS2: :VH: : CHI : : 
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linker: : anchor: : stop: : , 

where P reg is a regulatable promoter, RBS1 is a first 
ribosome binding site, SSI is a signal sequence 
5 operable in the host strain, VL is a member of a 

diverse set of light-chain variable regions, CL is a 
light-chain constant region, stop is one or more stop 
codons, RBS2 is a second ribosome binding site, SS2 is 
a second signal sequence operable in the host strain, 
10 VH is a member of a diverse set of heavy-chain variable 
regions, CHI is an antibody heavy-chain first constant 
domain, linker is a sequence of amino acids of one to 
about 50 residues, anchor is a protein that will 
assemble into the filamentous phage particle and stop 
15 is a second example of one or more stop codons; and 
b) positioning that cassette within the phage 

genome to maximize the viability of the phage 
and to minimize the potential for deletion of 
the cassette or parts thereof. 

20 

The DNA encoding the anchor protein in the 
above preferred cassette should be designed to encode 
the same (or a closely related) amino acid sequence as 
is found in one of the coat proteins of the phage, but 

25 with a distinct DNA sequence. This is to prevent 

unwanted homologous recombination with the w.t. gene. 
In addition, the cassette should be placed in the 
intergenic region. The positioning and orientation of 
the display cassette can influence the behavior of the 

30 phage. 

In one embodiment of the invention, a 
transcription terminator may be placed after the second 
stop of the display cassette above (e.g., Trp) . This 
will reduce interaction between the display cassette 
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and other genes in the phage antibody display vector 
(PADV) . 

In another embodiment of the methods of this, 
invention, the phage or phagemid can display proteins 
5 other than Fab, by replacing the Fab portions indicated 
above, with other protein genes. 

Various hosts can be used for growth of the 
display phage or phagemids of this invention. Such 
hosts are well known in the art. In the preferred 

10 embodiment, where Fabs are being displayed, the 

preferred host should grow at 30°C and be RecA" (to 
reduce unwanted genetic recombination) and EndA" (to 
make recovery of RF DNA easier) . It is also preferred 
that the host strain be easily transformed by 

15 electroporation . 

XLl-Blue MRF' satisfies most of these 
preferences, but does not grow well at 30°C. XLl-Blue 
MRF' does grow slowly at 38 °C and thus is an acceptable 
host. TG-1 is also an acceptable host although it is 

20 RecA + and EndA + . XLl-Blue MRF 1 is more preferred for 
the intermediate host used to accumulate diversity 
prior to final construction of the library. 

After display, the libraries of this 
invention may be screened using well known and 

25 conventionally used techniques. The selected peptides, 
polypeptides or proteins may then be used to treat 
disease. Generally, the peptides, polypeptides or 
proteins for use in therapy or in pharmaceutical 
compositions are produced by isolating the DNA encoding 

30 the desired peptide, polypeptide or protein from the 

member of the library selected. That DNA is then used 
in conventional methods to produce the peptide, 
polypeptides or protein it encodes in appropriate host 
cells, preferably mammalian host cells, e.g., CHO 
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cells. After isolation/ the peptide, polypeptide or 
protein is used alone or with pharmaceutically ( 
acceptable compositions in therapy to treat disease. 

EXAMPLES 

5 Example 1: Capturing kappa chains with BsmAI : 

A repertoire of human-kappa chain mRNAs was 
prepared by treating total or poly(A+) RNA isolated 
from a collection of patients having various autoimmune 
diseases with calf intestinal phosphatase to remove the 

10 5' -phosphate from all molecules that have them, such as 
ribosomal RNA, fragmented mRNA, tRNA and genomic DNA. 
Full length mRNA (containing a protective 7-methyl cap 
structure) is unaffected. The RNA is then treated with 
tobacco acid pyrophosphatase to remove the cap 

15 structure from full length mRNAs leaving a 5'- 
monophosphate group. 

Full length mRNA' s were modified with an 
adaptor at the 5' end and then reversed transcribed and 
amplified using the GeneRACE™ method and kit 

20 (Invitrogen) . A 5 f biotinylated primer complementary 
to the adaptor and a 3 f primer complementary to a 
portion of the construct region were used. 

Approximately 2 micrograms (ug) of human 
kappa-chain (Igkappa) gene RACE material with biotin 

25 attached to 5 T -end of upper strand was immobilized on 
200 microliters (yL) of Seradyn magnetic beads. The 
lower strand was removed by washing the DNA with 2 
aliquots 200 uL of 0.1 M NaOH (pH 13) for 3 minutes for 
the first aliquot followed by 30 seconds for the second 

30 aliquot. The beads were neutralized with 200 pL of 10 
mM Tris (pH 7.5) 100 mM NaCl. The short 
oligonucleotides shown in Table 525 were added in 40 
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NaCI, 10 mM Tris-HCl, 10 mM MgCl 2/ 1 mM dithiothreitol 
pH 7.9) to the dry beads. The mixture was incubated at 
95°C for 5 minutes then cooled down to 55°C over 30 
5 minutes. Excess oligonucleotide was washed away with 2 
washes of NEB buffer 3 (100 mM NaCI, 50 mM Tris-HCl, 10 
mM MgCl 2 , 1 mM dithiothreitol pH 7.9). Ten units of 
BsmAI (NEB) were added in NEB buffer 3 and incubated 
for 1 h at 55 °C. The cleaved downstream DNA was 

10 collected and purified over a Qiagen PCR purification 
column (FIGs. 3 and 4). 

A partially double-stranded adaptor was 
prepared using the oligonucleotide shown in Table 525. 
The adaptor was added to the single-stranded DNA in 100 

15 fold molar excess along with 1000 units of T4 DNA 
ligase (NEB) and incubated overnight at 16°C. The 
excess oligonucleotide was removed with a Qiagen PCR 
purification column. The ligated material was 
amplified by PCR using the primers kapPCRtl and kapfor 

20 shown in Table 525 for 10 cycles with the program shown 
in Table 530. 

The soluble PCR product was run on a gel and 
showed a band of approximately 700 n, as expected 
(FIGs. 5 and 6) . The DNA was cleaved with enzymes 

25 ApaLI and AscI, gel purified, and ligated to similarly 
cleaved vector pCESl. The presence of the correct size 
insert was checked by PCR in several clones as shown in 
FIG. 15. 

Table 500 shows the DNA sequence of a kappa 
30 . light chain captured by this procedure. Table 501 
shows a second sequence captured by this procedure. 
The closest bridge sequence was complementary to the 
sequence 5 1 -agccacc-3 1 , but the sequence captured reads 
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5 1 -Tgccacc-3 1 , showing that some mismatch in the 
overlapped region is tolerated. 

Example 2: Construction of Synthetic CDR1 and CDR2 
Diversity in V-3-23 VH Framework 

5 A synthetic Complementary Determinant Region 

(CDR) 1 and 2 diversity was constructed in the 3-23 VH 
framework in a two step process: first, a vector 
containing the 3-23 VH framework was constructed, and 
then, a synthetic CDR 1 and 2 was assembled and cloned 
10 into this vector. 

For construction of the V3-23 framework, 8 
oligos and two PCR primers (long oligonucleotides: 

TOPFR1A, BOTFR1B, BOTFR2, BOTFR3 , F06, BOTFR4, ON-vgCl, and 
ON-vgC2 and primers: SFPRMET and BOTPCRPRIM, shown in 

15 Table 600) that overlap were designed based on the 

Genebank sequence of V323 VH. The design incorporated 
at least one useful restriction site in each framework 
region, as shown in Table 600. In Table 600, the 
segments that were synthesized are shown as bold, the 

20 overlapping regions are underscored, and the PCR 

priming regions at each end are underscored. A mixture 
of these 8 oligos was combined at a final concentration 
of 2.5uM in a 20ul Polymerase Chain Reaction (PCR) 
reaction. The PCR mixture contained 200uM dNTPs, 2 . 5mM 

25 MgCl 2 , 0.02U Pfu Turbo™ DNA Polymerase, 1U Qiagen 

HotStart Taq DNA Polymerase, and IX Qiagen PCR buffer. 
The PCR program consisted of 10 cycles of 94 °C for 30s, 
55°C for 30s, and 72°C for 30s. The assembled V3-23 
DNA sequence was then amplified, using 2.5ul of a 10- 

30 fold dilution from the initial PCR in lOOul PCR 

reaction. The PCR reaction contained 200uM dNTPs, 
2.5mM MgCl 2 , 0.02U Pfu Turbo™ DNA Polymerase, 1U Qiagen 
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HotStart Taq DNA Polymerase, IX Qiagen PCR Buffer and 2 
outside primers (SFPRMET and BOTPCRPRIM) at a 
concentration of luM. The PCR program consisted of 23 
cycles at 94°C for 30s, 55°C for 30s, and 72°C for 60s. 
5 The V3-23 VH DNA sequence was digested and cloned into 
pCESl (phagemid vector) using the Sfil and BstEII 
restriction endonuclease sites (All restriction enzymes 
mentioned herein were supplied by New England BioLabs, 
Beverly, MA and used as per manufacturer's 

10 instructions) . 

Stuffer sequences (shown in Table 610 and 
Table 620) were introduced into pCESl to replace 
CDR1/CDR2 sequences (900 bases between BspEI and Xbal 
RE sites) and CDR3 sequences (358 bases between Aflll 

15 and BstEII) , prior to cloning the CDR1/CDR2 diversity. 
The new vector is pCES5 and its sequence is given in 
Table 620. Having stuff ers in place of the CDRs avoids 
the risk that a parental sequence would be over- 
represented in the library. The CDR1-2 stuffer 

20 contains restriction sites for Bgill, Bsu36I, Bell, 
Xcml, Mlul, PvuII, HpaJ, and Hindi, the underscored 
sites being unique within the vector pCES5. The 
stuffer that replaces CDR3 contains the unique 
restriction endonuclease site RsrII. The stuffer 

25 sequences are fragments from the penicillase gene of E . 
coli . 

For the construction of the CDR1 and CDR2 
diversity, 4 overlapping oligonucleotides (ON-vgCl, 
ON_Brl2, ON_CD2Xba, and ON-vgC2, shown in Table 600 
30 and Table 630) encoding CDR1/2, plus flanking regions, 
were designed. A mix of these 4 oligos was combined at 
a final concentration of 2.5uM in a 40ul PCR reaction. 
Two of the 4 oligos contained variegated sequences 
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positioned at the CDR1 and the CDR2 . The PCR mixture 
contained 200uM dNTPs, 2 . 5U Pwo DNA Polymerase (Roche), 
and IX Pwo PCR buffer with 2mM MgS0 4 . The PCR program 
consisted of 10 cycles at 94°C for 30s, 60°C for 30s, 
5 and 72 °C for 60s. This assembled CDR1/2 DNA sequence 
was amplified, using 2 . 5ul of the mixture in lOOul PCR 
reaction. The PCR reaction contained 200uM dNTPs, 2 . 5U 
Pwo DNA Polymerase, IX Pwo PCR Buffer with 2mM MgS0 4 and 
2 outside primers at a concentration of luM. The PCR 

10 program consisted of 10 cycles at 94 °C for 30s, 60°C 

for 30s, and 72°C for 60s. These variegated sequences 
were digested and cloned into the V3-23 framework in 
place of the CDR1/2 stuffer. 

We obtained approximately 7 X 10 7 independent 

15 transformants. Into this diversity, we can clone CDR3 
diversity either from donor populations or from 
synthetic DNA. 



It will be understood that the foregoing is 
only illustrative of the principles of this invention 
20 and that various modifications can be made by those 

skilled in the art without departing from the scope of 
and sprit of the invention. 
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We claim: 

1. A method for cleaving single-stranded 
nucleic acid sequences at a desired location, the 
method comprising the steps of: 

5 (i) contacting the nucleic acid with a 

single-stranded oligonucleotide, the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired and 

10 including a sequence that with its complement 

in the nucleic acid forms a restriction 
endonuclease recognition site that on 
restriction results in cleavage of the 
nucleic acid at the desired location; and 

15 (ii) cleaving the nucleic acid solely at 

the recognition site formed by the 
complementation of the nucleic acid and the 
oligonucleotide; 

the contacting and the cleaving steps being performed 
20 at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
25 at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature, 

2. A method for cleaving single-stranded 
nucleic acid sequences at a desired location, the 

3D method comprising the steps of: 

(i) contacting the nucleic acid with a 
partially double-stranded oligonucleotide, 
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the single-stranded region of the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired, and the 
5 double-stranded region of the oligonucleotide 

having a Type II-S restriction endonuclease 
recognition site, whose cleavage site is 
located at a known distance from the 
recognition site; and 
10 (ii) cleaving the nucleic acid solely at 

the Type II-S cleavage site formed by the 
complementation of the nucleic acid and the 
single-stranded region of the 
oligonucleotide; 

15 the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 

20 two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 



3 . In a method for displaying a member of a 
25 diverse family of peptides, polypeptides or proteins on 
the surface of a genetic package and collectively 
displaying at least a part of the diversity of the 
family, the improvement being characterized in that the 
displayed at least a part of peptide, polypeptide or 
3D protein is encoded at least in part by a nucleic acid 
that has been cleaved at a desired location by a method 
comprising the steps of: 
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(i) contacting the nucleic acid with a 



10 



5 



single-stranded oligonucleotide, the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired and 
including a sequence that with its complement 
in the nucleic acid forms a restriction 
endonuclease recognition site that on 
restriction results in cleavage of the 
nucleic acid at the desired location; and 



(ii) cleaving the nucleic acid solely at 



the recognition site formed by the 
complementation of the nucleic acid and the 
oligonucleotide ; 



15 the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 

20 two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 



25 diverse family of peptides, polypeptides or proteins on 
the surface of a genetic package and collectively 
displaying at least a part of the diversity of the 
family, the improvement being characterized in that the 
displayed peptide, polypeptide or protein is encoded by 

3D a DNA sequence comprising a nucleic acid that has been 
cleaved at a desired location by 



4. 



In a method for displaying a member of a 
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(i) contacting the nucleic acid with a 
partially double-stranded oligonucleotide, 
the single-stranded region of the 
oligonucleotide being functionally 

5 complementary to the nucleic acid in the 

region in which cleavage is desired, and the 
double-stranded region of the oligonucleotide 
having a Type II-S restriction endonuclease 
recognition site, whose cleavage site is 
10 located at a known distance from the 

recognition site; and 

(ii) cleaving the nucleic acid solely at 
the Type II-S cleavage site formed by the 
complementation of the nucleic acid and the 

15 single-stranded region of the 

oligonucleotide; 

the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 

20 oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desires location, 
and the cleavage being carried out using a restriction 

25 endonuclease that is active at the chosen temperature. 

5. A method for displaying a member of a 
diverse family of peptides, polypeptides or proteins on 
the surface of a genetic package and collectively 
displaying at least a part of the diversity of the 
3D family, the method comprising the steps of: 
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(i) preparing a collection of nucleic acids 
that code at least in part for members of the diverse 
family; 

(ii) rendering the nucleic acids single- 

5 stranded; 

(iii) cleaving the single-stranded nucleic 
acids at a desired location by a method comprising the 
steps of: 

(a) contacting the nucleic acid with a 
10 single-stranded oligonucleotide, the 

oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired and 
including a sequence that with its complement 
15 in the nucleic acid forms a restriction 

endonuclease recognition site that on 
restriction results in cleavage of the 
nucleic acid at the desired location; and 

(b) cleaving the nucleic acid solely at 
20 the recognition site formed by the 

complementation of the nucleic acid and the 
oligonucleotide; 
the contacting and the cleaving steps being 
performed at a temperature sufficient to maintain 

25 the nucleic acid in substantially single-stranded 

form, the oligonucleotide being functionally 
complementary to the nucleic acid over a large 
enough region to allow the two strands to 
associate such that cleavage may occur at the 

3 0 chosen temperature and at the desired location, 

and the cleavage being carried out using a 
restriction endonuclease that is active at the 
chosen temperature; and 
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(iv) displaying a member of the family of 
peptides, polypeptides or proteins coded, at least in 
part, by the cleaved nucleic acids on the surface of 
the genetic package and collectively displaying at 
5 least a portion of the diversity of the family. 

6. A method for displaying a member of a 
diverse family of peptides, polypeptides or proteins on 
the surface of a genetic package and collectively 
displaying at least a portion of the diversity of the 
10 family, the method comprising the steps of: 

(i) preparing a collection of nucleic acids 
that code, at least in part, for members of the diverse 
family; 

(ii) rendering the nucleic acids single- 

15 stranded; 

(iii) cleaving the single-stranded nucleic 
acids at a desired location by a method comprising the 
steps of: 

(a) contacting the nucleic acid with a 
20 partially double-stranded oligonucleotide, 

the single-stranded region of the 
oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired, and the 
25 do uble- stranded region of the oligonucleotide 

having a Type II-S restriction endonuclease 
recognition site, whose cleavage site is 
located at a known distance from the 
recognition site; and 
30 (b) cleaving the nucleic* acid solely at 

the Type II-S cleavage site formed by the 
complementation of the nucleic acid and the 



WO 01/79481 



PCT/US01/12454 



- 50 - 

single-stranded region of the 
oligonucleotide; 
the contacting and the cleaving steps being 
performed at a temperature sufficient to maintain 
the nucleic acid in substantially single-stranded 
form, the oligonucleotide being functionally 
complementary to the nucleic acid over a large 
enough region to allow the two strands to 
associate such that cleavage may occur at the 
chosen temperature and at the desired location/ 
and the restriction being ' carried out using a 
cleavage endonuclease that is active at the chosen 
temperature; and 

(iv) displaying a member of the family of . 
15 peptides, polypeptides or proteins coded, at least in 
part, by the cleaved nucleic acids on the surface of 
the genetic package and collectively displaying at 
least a portion of the diversity of the family. 

7. A library comprising a collection of 
20 genetic packages that display a member of a diverse 
family of peptides, polypeptides or proteins and 
collectively display at least a portion of the 
diversity of the family, the library being produced 
using the methods of claims 3, 4, 5 or 6. 

25 8. A library comprising a collection of 

genetic packages that display a member of a diverse 
family of peptides, polypeptides or proteins and that 
collectively display at least a portion of the family, 
the displayed peptides, polypeptides or proteins being 

3D encoded by DNA sequences comprising at least in part 
sequences produced by cleaving single-stranded nucleic 



5 



10 
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acid sequences at a desired location by a method 
comprising the steps of: 

(i) contacting the nucleic acid -with a 
single-stranded oligonucleotide, the 

5 oligonucleotide being functionally 

complementary to the nucleic acid in the 
region in which cleavage is desired and 
including a sequence that with its complement 
in the nucleic acid forms a restriction 
10 endonuclease recognition site that on 

restriction results in cleavage of the 
nucleic acid at the desired location; and 

(ii) cleaving the nucleic acid solely at 
the recognition site formed by the 

15 complementation of the nucleic acid and the 

oligonucleotide; 

the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 

20 oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 

25 endonuclease that is active at the chosen temperature. 

9. A library comprising a collection of 
genetic packages that display a member of a diverse 
family of peptides, polypeptides or proteins and that 
collectively display at least a portion of the 
3D diversity of the family of the displayed peptides, 

polypeptides or proteins being encoded by DNA sequences 
comprising at least in part sequences produced by 
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cleaving single-stranded nucleic acid sequences at a 
desired location by a method comprising the steps of: 

(i) contacting the nucleic acid with a 
partially double-stranded oligonucleotide, 
. 5 the single-stranded region of the 

oligonucleotide being functionally 
complementary to the nucleic acid in the 
region in which cleavage is desired, and the 
double-stranded region of the oligonucleotide 
10 having a Type II S restriction endonnclease 

recognition site, whose cleavage site is 
located at a known distance from the 
recognition site where the cleavage of the 
nucleic acid is desired; and 
15 (ii) cleaving the nucleic acid solely at 

the Type II-S cleavage site formed by the 
complementation of the nucleic acid and the 
single-stranded region of the 
oligonucleotide ; 
20 the contacting and the cleaving steps being performed 
at a temperature sufficient to maintain the nucleic 
acid in substantially single-stranded form, the 
oligonucleotide being functionally complementary to the 
nucleic acid over a large enough region to allow the 
25 two strands to associate such that cleavage may occur 
at the chosen temperature and at the desired location, 
and the cleavage being carried out using a restriction 
endonuclease that is active at the chosen temperature. 



30 



10. The methods according to any one of 
claims 1 to 9, wherein the nucleic acids encode at 
least a portion of an immunoglobulin. 
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11. The methods according to claim 10, 
wherein the immunoglobulin comprises a Fab or single 
chain Fv. 

12. The methods according to claim 10 or 11, 
5 wherein the immunoglobin comprises at least portion of 

a heavy chain. 

13. The methods according to claim 12, 
wherein at least a portion of the heavy chain is human. 

14. The methods according to claim 10 or 11, 
10 wherein the immunoglobulin comprises at least a portion 

of FR1. 

15. The methods according to claim 14 , 
wherein at least a portion of the FR1 is human. 

16. The methods according to claim 10 or 11 , 
15 wherein the immunoglobulin comprises at least a portion 

of a light chain. 

17. The methods according to claim 16, 
wherein at least a portion of the light chain is human. 

20 '18. The methods according to any one of 

claims 1 to 9, wherein the nucleic acid sequences are 
at least in part derived from patients suffering from 
at least one autoimmune disease and/or cancer. 



25 



19. The methods according to claim 18, 
wherein the autoimmune disease is selected from the 
group comprising lupus, erythematosus, systemic 
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sclerosis, rheumatoid arthritis, antiphosolipid 
syndrome or vasculitis. 

20. The methods according to claim 18/ 
wherein the nucleic acids are at least in part isolated 

5 from the group comprising peripheral blood cells, bone 
marrow cells spleen cells or lymph node cells. 

21. The methods according to claim 5 or 6 
further comprising an nucleic acid amplification step 
between steps (i) and (ii), between steps (ii) and 

10 (iii) or between steps (iii) and (iv) . 

22. The methods according to claim 21, 
wherein the amplification step uses geneRACE™. 

23. The methods according to any one of 
claims 1 to 9, wherein the temperature is between 45°C 

15 and 75°C. 

24. The methods according to claim 23, 
wherein the temperature is between 50°C and 60°C. 

25. The methods according to claim 24, 
wherein the temperature is between 55°C and 60°C. 

20 26. The methods according to claim 1, 3, 5 

or 8, wherein the length of the single-stranded 
oligonucleotide is between 17 and 30 bases. 

27. The methods according to claim 26, 
wherein the length of the single-stranded 
25 oligonucleotide is between 18 and 24 bases. 
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28. The methods according to claim 1, 3, 5 
or 8, wherein the restriction endonuclease is selected 
from the group comprising Maelll, Tsp45I, HphI, BsaJI, 
Alul, Blpl, Ddel, Bglll, Msll, BsiEI, Eael, Eagl, 

5 ffaelll, Bst4CI, JfpyCH4III, HinfX, Mlyl, Plel, Mnll, 
JfpyCH4V, BsroAI, Bpml, XmnI, or Sad. 

29. The methods according to claim 28, 
wherein the restriction endonuclease is selected from 
the group comprising Bst4CI, Taal, HpyCH4111, Blpl, 

10 HpyCH4V or Msll. 

30. The methods according to claim 2, 4, 6 
or 9, wherein the length of the single-stranded region 
of the partially double-stranded oligonucleotide is 
between 14 and 22 bases. 

15 31. The methods according to claim 30, 

wherein the length of the single-stranded region of the 
partially double-stranded oligonucleotide is between 14 
and 17 bases. 

32. The methods according to claim 31, 

20 wherein the length of the single-stranded region of the 
oligonucleotide is between 18 and 20 bases. 

33. The methods according to claim 2, 4, 6 
or 9, wherein the length of the double-stranded region 
of the partially double- stranded oligonucleotide is 

25 between 10 and 14 base pairs formed by a stem and its 
palindrome. 
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34. The methods according to claim 33 
wherein, the partially double- stranded oligonucleotide 
comprises a loop of 3 to 8 bases between the stem and 
the palindrome. 

5 35. The methods according to claim 2, 4, 6 

or 9, wherein the Type II-S restriction endonuclease is 
selected from the group comprising AarlCAC, Acelll, 
Bbr7I, Bbvl, Bbvl I, Bce83I, BceAI, Bcefl, BciVI, Bfil, 
BinI, BscAI, BseRI/ BsmFI , BspMI, Ecil f Eco57I, Faul, 
10 Fokl, Gsul, Hgal, HphI, MboII, Mlyl, Mmel, Mnll, Plel, 
RleAI/ SfaNI, SspD5I, Sthl32I, StsI, Taqll, Tthlllll, 
or UbaPI. 

36. The methods according to claim 35, 
wherein the Type II-S restriction endonuclease is Fokl. 

15 37. A method for preparing single-stranded 

nucleic acids for cloning into an vector, the method 
comprising the steps of: 

(i) contacting a single-stranded nucleic 
acid sequence that has been cleaved with a 

20 restriction endonuclease with a partially 

double-stranded oligonucleotide, the single- 
stranded region of the oligonucleotide being 
functionally complementary to the nucleic 
acid in the region that remains after 

25 cleavage, the double- stranded region of the 

oligonucleotide including any sequences 
necessary to return the sequences that remain 
after cleavage into proper and original 
reading frame for expression and containing a 

30 restriction endonuclease recognition site 5' 

of those sequences; and 
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(ii) cleaving the partially double- 
stranded oligonucleotide sequence solely at 
the restriction endonuclease recognition site 
contained within the double-stranded region 
5 of the partially double-stranded 

oligonucleotide . 

38. The method according to claim 37, 
wherein the length of the single-stranded portion of 
the partially double- stranded oligonucleotide is 

10 between 2 and 15 bases. 

39. The method according to claim 38 , 
wherein the length of the single-stranded portion of 
the partially double-stranded oligonucleotide is 
between 7 and 10 bases. 

15 40. The method according to claim 37, 

wherein the length of the double-stranded portion of 
the partially double- stranded oligonucleotide is 
between 12 and 100 base pairs. 



20 41. The method according to claim 40, 

wherein the length of the double-stranded portion of 
the partially double-stranded oligonucleotide is 
between 20 and 100 base pairs. 
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Table 1: Cleavage of 75 
Enzyme Recognition* 



human light chains. 



Afel 


AGCgct 


o 


o 


AflXX 


Cttaag 


0 


o 


Age I 


Accaat 


o 


o 

w 


AscI 


GGcgcgcc 


o 


o 


Bglll 


Agatct 




n 

LI 


BsiWI 


Cgtacg 


o 


o 


BspDI 


ATcgat 


0 


0 


BssHII 


Gcgcgc 


0 


0 


BstBI 


TTcgaa 


0 


o 


Drain 


CACNNNgtg 


0 


o 


EagI 


Cggccg 


0 


0 


Fsel 


GGCCGGcc 


0 


0 


Fspl 


TGCgca 


0 


0 


Hpal 


GTTaac 


0 


0 


Mfel 


Caattg 


0 


0 


Mlul 


Acgcgt 


0 


0 


Ncol 


Ccatgg 


o 


o 


Mhel 


Gctagc 


o 

V 




NotI 


GCaaccac 


n 

w 




Nrul 


x uuuya 


u 


u 


Pad 




n 

V 


pi 
u 


Pin e I 


ul i. l adaC 


U 


pi 

u 


Pmll 




n 


p, 

u 


Pvul 


CGATcg 


o 




SacII 


CCGCgg 


0 


o 


Sail 


Gtcgac 


o 


0 


Sfil 


GGC CNNNNnggcc 


o 


0 


Sgfl 


GCGATcgc 


o 


o 


SnaBI 


TACgta 


o 


o 


StuI 


AGGcCt 




n 


Xbal 


Tctaga. 


o 


o 


Aatll 


GACGTc 


1 

X 


x 


Acll 


AAcgtt 


1 

X 


i 
X 


Asel 


ATtaat 




1 

X 


BsmI 


GAATGCN 


1 


X 


BspEI 


Tcccrfia 


1 


JL 






X 


1 


HrH T 


P^ 71 r*KTKTKTXTm r\ «>■*- « 

LrAuNNiNiNnngtc 


1 


1 


til nW T TT 


Aagctt 


1 


1 


JrCx X 


Acatgt 


1 


1 


CariT 
oapi 


gaagagc 


1 


1 




AGTact 


1 


1 


SexAI 


Accwggt 


1 


1 


Spel 


Actagt 


1 


1 


Tlil 


Ctcgag 


1 


1 


Xhol 


Ctcgag 


1 


1 


Bcgl 


cgannnnnntgc 


2 


2 


Blpl 


GCtnagc 


2 


2 


BssSI 


Ctcgtg 


2 


2 


BstAPI 


GCANNNNntgc 


2 


2 


Espl 


GCtnagc 


2 


2 


KasI 


Ggcgcc 


2 


2 


PflMI 


CCANNNNntgg 


2 


2 


Xmnl 


GAANNnnttc 


2 


2 



HC PR3 



After LC 



0 Heavy Chain signal 
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ApaLX 


Gfcgcac 


3 


3 


LC sxgi 


JNaGx 


Lrv^^y y C 


-3 






M/taMT 


uccygc 


•3 
J 


o 




OtniTT 






■J 




O a t-T T 

Ksni 


swifts* 

Luy wccg 


-> 


"5 




BsrBI 


GAGcgg 




4 






bLnnXuNIMIl 




A 
*i 




lie 7 1 *7 T 


blAtaC 




A 
ft 






Odd l_ Lt 




4 
*± 




CvsVt T 

opm 


CjCAx LtC 




A 
*t 




C ctr\T 
bSpi 






A 
*i 




TV #*m T 

ACCX 


GTmkac 




c 
O 




dCII 


Tgatca 


D 


c 
3 




OOillU J- 


Nnnnnncraaaca 


5 


5 




BsrGI 


Totaca 


5 


5 




Dral 


TTTaaa 


6 


6 




Ndel 


CAtatg 


6 


6 


HC FR4 


c U a T 


AX X X aao u 


D 


o 




BaroHI 


uvj a i» \* \+ 


7 


7 




e ar T 


UAue X V— 


7 


7 




DwlV J- 


(jXAX wl^lNINJNJNWW 


Q 
0 


o 
o 




DeaDT 

DoaOl 


vjrtX in n lid 


O 

o 


p 




Nsil 


ATGCAt 


Q 

o 


8 




t) on 1 OAT 




Q 


Q 


uax 






Q 


Q 


Lax 


PspOOMI 


Gggccc 


9 


9 




BspHI 


Tcatga 


9 


11 




dCOKV 


bAl a lC 


Q 


Q 




7V Vn-J T 

Anal 


TV ATXTVY ^ ^ 

vjAUxn wwnng u c 


XX 


XX 




OJ3S X 


/~7\ 7v r*7ir* 

VjAAuAv^ 


X X 


X H 




DsnT 


Ifp TV ^ a a 


1 9 
X£ 


1 9 
X£ 




oSal 




X o 


XD 




AiUaX 


i-ccggg 


X J 


Xfi 




Aval 


Y F /T TT" /T 

uycgrg 


X«* 


xo 






vjV^L.INlN NNuyyC 


1 A 
X4 


X / 




/■IX WIN X 


l» a v» jn n c u g 


X o 


X P 




RcriMT 




1 7 

X / 


1 Q 

X? 




Ai^iiix 


f^ 1 7A MMMMMnnnnTrrrr 
\+\+ aim im iMrvj tii i uixui t» y y 


1 7 
X / 


- c. D 




n a f t t 
DSuLXX 


Ggtuacc 


X? 




op npi 


SSe8387X 


ccTGCAgg 


zu 


2U 




Avrll 


Cctagg 


22 


22 




Hindi 


GTYrac 


22 


22 




Bsgl 


GTGCAG 


27 


29 




MSCX 


TGGcca 


*a n 
JU 


J4 




DScKI 










Bsu36I 


CCtnagg 


35 


37 




PstI 


CTGCAg 


35 


40 




Ecil 


nnnnnnnnntccgcc 


38 


40 




PpuMI 


RGgwccy 


41 


50 




Styl 


Ccwwgg 


44 


73 




EcoO109I 


RGgnccy 


46 


70 




Acc65I 


Ggtacc 


50 


51 




Kpnl 


GGTACc 


50 


51 




Bpml 


ctccag 


53 


82 




Avail 


Ggwcc 


71 


124, 





* cleavage occurs in the top strand after the last upper-case base. For REs 
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that cut palindromic sequences , the lower strand is cut at the symmetrical 
site. 

Table 2: Cleavage of 79 human heavy chains 

Nch Ns Planned location of site 

HC FR3 
After LC 



Enzvme 


Recocrnition 


Nch 


Ns 


Afel 


AGCgct 


0 


0 


Aflll 


Cttaag 


0 


0 


AscX 


GGcacacc 


o 


o 


BsiWI 


Ccftaccr 




n 


BspDX 


ATcciat!. 


n 




Qe oUTT 

ossnix 




u 


U 


Fsel 


GGCCGGcc 


n 


u 


Hpal 




n 

V 


n 


IOViaT 


Go tag c 


Q 


0 


NOti 


GCggccgc 


0 


0 


Nrul 


TCGcga 


0 


0 


Nsil 


ATGCAt 


0 


0 


Pad 


TTAATtaa 


0 


0 


Pcil 


Acatgt 


0 


0 


Pmel 


GTTTaaac 


0 


0 


PVUX 


CGATcg 


0 


0 


Ksrxi 


CGgwccg 


0 


0 


QanT 

oapx 


gaagagc 


u 


ft 

0 


oZlx 


GGCCNNNNnggcc 


0 


0 


Sgf I 


GCGATcgc 


0 


0 


Swal 


ATTTaaat 


0 


0 


ACXX 


AAcgtt 


1 


1 


Age I 


Accggt 


1 


1 


Asel 


ATtaat 




1 


nVI X X 


Cctagg 




1 


DO ill J. 






X 


BsrBl 






X 


BsrDI 


GCAATGNNn 




1 


Oral 


TTTaaa 


- 


JL 


Fspl 




- 


1 

X 


Hindlll 


Aacict t 


- 


X 


ncex 


. v*aa ttg 




X 


Nacl 


GCCggc 




X 




v?ccggc 




1 


Spel 


Actacrt 

»*w way 




1 

J- 


Acc65l 


Ggtacc 


2 


2 


BstBI 


TTcgaa 


2 


2 


Kpnl 


GGTACc 


2 


2 


Mlul 


Acgcgt 


2 


2 


Ncol 


Ccatgg 


2 


2 


KdeZ 


CAtatg 


2 


2 


Pmll 


CACgtg 


2 


2 


Xcml 


CCANNNNNnnnntgg 


2 


2 


Bcgl 


egannnnnntge 


3 


3 


Bell 


Tgatca 


3 


3 


Bgll 


GCCNNNNnggc 


3 


3 


BsaBI 


GATNNnnatc 


3 


3 


BsrGI 


Tgtaca 


3 


3 


SnaBI 


TACgta 


3 


3 


Sse8387I 


CCTGCAgg 


3 


3 



HC Linker 

In linker, HC/anchor 



HC signal seq 



In HC signal seq 
HC FR4 
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ApaltX 


Gtgcac 


4 


4 


IiC Sicmai/FRA 


BspHI 


Tcatga 


4 


4 




Bsssr 


Ctccrta 


4 


4 




Psil 


TTAtaa 


4 
•* 






SphI 




** 






Ahdl 


A f* NTJUn r> rr ^ r» 

VJAwLiNmUlU L>W 


c 






osp&x 


xccgga. 


5 


5 


HC FR1 


MSC1 


TGGcca 


5 


5 




e ^ t 


GAGCTc 


5 


5 




oCa. JL 


AGTact 


5 


5 




CnvS T 
DcXnl 


Accwggt 


5 


6 




ospi 


Afti act 


5 


5 




1111 


Ctcgag 


5 


5 




AllOI 


Ctcgag 


5 


5 




DUo J. 


OxiAoAW 


/ 


o 
o 




BstAPI 




/ 


p 




BstZ17I 


GTAtac 


7 


7 




EcoRV 


GATatc 


7 


7 




EcoRX 


Gaattc 


8 


3 




BlpI 


GCtnagc 


9 


9 




Bsu36I 


CCtnagg 


9 


9 




Drain 


CACNNNgtg 


9 


9 




Espl 


GCtnagc 


9 


9 




StuI 


AGGcct 


Q 


1 1 

X o 




Xbal 


To tags. 


Q 


Q 


err* too 
xlw cxU 


Bspl20I 


'ayy www 


xu 


XX 


CHI 




uuidwwG 




XI 


CHI 


IT spUvN X 


v^ggccc 


10 


11 




UUl V X 


T A T ff" miKniKM 
olnl wl^NfUftlfVAIftl 


1 1 


11 




• Q st 1 T 
Oal X 


Gt cgac 


11- 


12 




DrdI 


G AC NMNTMn n rr 1- r» 




-L2 




Kas I 


vjy y ww 


1Z 


1Z 




XioaX 


wwwy yg 


12 


T A 
14 




Bglll 


Atrial" 


1 A 


14 




Hindi 


ox iiau 


ID 


1 Q 
lo 




RamHT 
DcuuTlX 


Ggatcc 


17 


17 




P-flMT 

J; J- -L.1TJ X 


wwnmMMjNii tgg 


1 / 


lb 




D5IQD1 


w nnnnngaga c g 


lo 


21 




OS LAI 


wwAitriciXtriuiigg 


18 


X9 


HC FR2 


Ainnx 


/— ■ 7\ is KTVTn _ +_ 4_ — 

bMNNnnttc 


18 


18 




bacll 


CCGCgg 


19 


19 




r> _ 4_ T 
rStl 


CTGCAg 


20 


24 




rvUll 


CAGctg 


20 


22 




Aval 


Cycgrg 


21 


24 




Zj a y x 


<~gy ccg 


21 


22 




na C-X X 




22 


22 




BspMI 


ACCTGC 




33 




AccI 


GTmkac 


30 


43 




Styl 


Ccwwgg 


36 


49 




AlwNI 


CAGNNNctg 


38 


44 




Bsal 


GGTCTCNnnnn 


38 


44 




PpuMI 


RGgwccy 


43 


46 




Bsgl 


GTGCAG 


44 


54 




BseRI 


NNnnnnnnnnctcctc 


48 


60 




Ecil 


nnnnnnnnntccgcc 


52 


57 




BstEII 


Ggtnacc 


54 


61 HC Pr4 # 47/79 have one 


ECOO109I 


RGgnccy 


54 


86 
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Bpml ctccag 60 121 

Avail Ggwcc 71 140 
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Table 5 (am e nded) : Use of Fokl as "Universal Restriction Enzyme" 

FokX - for dsDNA, 1 represents sites of cleavage 

sites of cleavage 
5 1 -cacGG&TGtg — nnnnnnn I nnnnnnn-3 1 ( SEQ ID NO: 15) 
3 ' -gtgCCTACac — nnnnnnnnnnn | nnn-5 ■ ( SEQ ID NO : 1 6 ) 
RECOG 

NITion of FokX 



Case I 



5'-. . .gtg | tatt-actgtgc. .Substrate -3' (SEQ ID NO: 17) 

3 1 -cac- ataa \ taacaca- ) 

gtGTAGGcac\ 
5 1 - caCATCCgtg/ (SEQ ID NO: 18) 



Case II 



5*-. . .gtgtatt|agac-tgc. .Substrate -3* (SEQ ID NO: 19) 

i — cacataa -tcta | acg-5 ' 
/gtgCCTACac 

\cacGGATGtg-3 ' {SEQ ID NO: 20) 

Case III (Case I rotated 180 degrees) 

/gtgCCTACac-5 • 
\ cacGGATGtq— ) 

gtcitctt 1 acaa-tcc-3 ' Adapter (SEQ ID NO: 21) 
3'- cacagaa-tgtc I agg. .substrate -5'(SEQ ID NO:22) 

Case IV (Case II rotated 180 degrees) 
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3«- gtGTAGGcac\ (SEQ ID NO: 23) 
f—caCATCCgtg/ 
5 ' ~aaa 1 tctc-actqaqc 

Substrate 3 . . ctc-agagl tgactcg. . .-5' (SEQ ID NO: 24) 
Improved Fokl adapters 

Fokl - for dsDNA, | represents sites of cleavage 
Case I 

Stem 11, loop 5, stem 11, recognition 17 

5 1 - . . . catgtg I tatt-actgtgc . . Substrate .... -3 * 
3 1 -atacac- ataa I taacacg- t r T— ( 

gtGTAGGcacG T 
5'- caCATCCgtgc C 

Case II 

Stem 10, loop 5, stem 10, recognition 18 

5 • - . . . gtgtatt | agac-tgctgcc . . Substrate .... -3 ' 
r T n " p — cacataa -tctg I acaacaa-5 1 

T gtgCCTACac 
C cacGGATGtg-3 • 

Case III (Case I rotated 180 degrees) 
Stem 11, loop 5, stem 11, recognition 20 

r T n 

T TgtgCCTACac-5 ■ 
G Aca cGGAT Gtoj— j 

L TT J gtgtctt 1 acag-tccattctg-3 ' Adapter 

3 1 - . . . cacagaa-tgtc | aggtaagac . . substrate .... -5 1 

Case IV (Case II rotated 180 degrees) 
Stem 11, loop 4, stem 11, recognition 17 

3'- gtGTAGGcacc T 
f—caCATCCgtgg T 
5 1 -atcgag I tctc-actqaqc 1-T-l 
Substrate 3'-. . . tagctc-agag | tgactcg. . .-5' 
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BseRX 

I sites of cleavage 
5 ' -cacGAGGAGnnnnnnnnnn | nnnnn-3 1 
3 ' -gtgctcctcnnnnnnnn | nnnnnnn-5 1 
RECOG 

NITion of BseJa 

Stem 11, loop 5, stem 11, recognition 19 

3 * - gaacat | cg-ttaagccagta 5 1 

fT-T-, cttgta-gc | aattcggtcat-3 ' 

C GCT GAGGAGT C — ' 

T cgactcctcag-5 ' An adapter for BseRX to cleave the substrate above - 
L T — I 
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Table 8: Matches to URE FR3 adapters in 79 human HC. 
A. List of Heavy-chains genes sampled 



AF008566 


afl03343 




HSA235676 


HSU92452 


HSZ93860 


AF035043 


AF103367 




HSA235675 


HSU94412 


HSZ93863 


AF103026 


AF103368 




HSA235674 


HSU94415 


MCOMFRAA 


afl03033 


AF103369 




HSA235673 


HSU94416 


MCOMFRVA 


AF103061 


AF103370 




HSA240559 


HSU94417 


S82745 


Afl03072 


afl03371 




HSCB201 


HSU94418 


S82764 


afl03078 


AF103372 




HSIGGVHC 


HSU96389 


S83240 


AF103099 


AF158381 




HSU44791 


HSU96391 


SABVH369 


AF103102 


E05213 




HSU44793 


HSU96392 


SADEIGVH 


AF 10 J1U J 


EUbooo 




HSUb2 / / 1 


HSUyoJ95 


SAHzIGVH 


AF103174 


E05887 




HSU82949 


HSZ93849 


SDA3IGVH 


AF103186 


HSA235661 




HSU82950 


HSZ93850 


SIGVHTTD 


afl03187 


HSA235664 




HSU82952 


HSZ93851 


SUK4IGVH 


AF103195 


HSA235660 




HSU82961 


HSZ93853 




afl03277 


HSA235659 




HSU8 6522 


HSZ93855 




afl03286 


HSA235678 




HSU86523 


HSZ93857 




AF103309 


HSA235677 










Table 8 B. Testing all distinct GLGs from bases 89.1 to 93.2 of the heavy variable domain 


Id 


Nb 0 1 


2 


3 4 




SEQ ID NO: 


1 


38 15 11 


10 


0 2 Seql 


gtgtattactgtgc 


25 


2 


19 7 6 


4 


2 0 Seq2 


gtAtattactgtgc 


26 


3 


1 0 0 


1 


0 0 Seq3 


gtgtattactgtAA 


27 


4 


7 15 


1 


0 0 Seq4 


gtgtattactgtAc 


28 


5 


0 0 0 


0 


0 0 Seq5 


Ttgtattactgtgc 


29 


6 


0 0 0 


0 


0 0 Seq6 


TtgtatCactgtgc 


30 


7 


3 10 


1 


1 0 Seq7 


ACAtattactgtgc 


31 


8 


2 0 2 


0 


0 0 Seq8 


ACgtattactgtgc 


32 


9 


9 2 2 


4 


1 0 Seq9 


ATatattactatac 


33 


Group 


26 26 


21 


4 2 






Cumulative 


26 52 


73 


77 79 







[51] 
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Table 8C Most important URE recognition seqs in FR3 Heavy 

1 VHSzyl GTGtattactgtgc (ON__SHC103) (SEQ ID NO:25) 

2 VHSzy2 GTAtattactgtgc (ON_SHC323) (SEQ ID NO:26) 

3 VHSzy4 GTGtattactgtac (ON_SHC349) (SEQ ID NO:28) 

4 VHSzy9 ATGtattactgtgc (ON_SHC5a) (SEQ ID NO:33) 



Table 8D, testing 79 human HC V genes with four probes 

Number of sequences 7 9 

Number of bases 29143 

Number of mismatches 
Id Best 0 1 2 3 4 5 

1 39 15 11 10 1 2 0 Seql gtgtattactgtgc (SEQ ID NO: 25) 

2 22 765301 Seq2 gtAtattactgtgc (SEQ ID NO: 26) 

3 7 151000 Seq4 gtgtattactgtAc (SEQ ID NO: 28) 

4 11 244100 Sea9 ATatattactat ac (SEQ ID NO:33) 
Group 25' 26 20 5 2 

Cumulative 25 51 71 7 6 78 



One sequence has five mismatches with sequences 2, 4, and 9; it is scored as best for 2. 



Id is the number of the adapter. 

Best is the number of sequence for which the identified adapter was the best available. 
The rest of the table shows how well the sequences match the adapters. For example, there are 11 
sequences that match VHSzyl (Id=l) with 2 mismatches and are worse for all other adapters. In 
this sample, 90% come within 2 bases of one of the four adapters. 
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Table 130: PGR primers for amplification of human Ab genes 
(HulgMFOR) 5*-tgg aag agg cac gtt ctt ttc ttt-3* 

30 ! (HulgMFOREtop) 5 1 -aaa gaa aag aac gtg cct ctt cca-3* = reverse complement 
(HuCkFOR) 5' -aca etc tec cct gtt gaa get ctt-3 1 

(HuCL2FOR) 5'-tga aca ttc tgt agg ggc cac tg-3 1 

(HuCL7FOR) 5'-aga gca ttc tgc agg ggc cac tg-3* 

! Kappa 

35 (CKForeAsc) 5' -ace gee tec acc ggg cgc gec tta tta aca etc tec cct gtt- 

gaa get ctt-3 1 

(CL2ForeAsc) 5' -acc gee tec acc ggg cgc gee tta tta tga aca ttc tgt- 
agg ggc cac tg-3' 

(CL7ForeAsc) 5' -acc gee tec acc ggg cgc gee tta tta aga gca ttc tgc- 
40 agg ggc cac tg-3' 



Table 195: Human GLG FR3 sequences 
45 ! VHl 

! 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 
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agg 


gtc 


ace 


atg acc agg 


gac 


a eg 


tec 


ate 


age 


aca 


gee 


tac 


atq 




! 81 


82 


82a 


82b 82c 83 


84 


85 


86 


87 


88 


89 


90 


91 


92 




gag 


ctg 


age 


aqq ctq aqa 

33 3 3 


tct 


qac 


qac 


acq 


qcc 

3 ^ 


qtq 

3 *~3 


tat 


tac 


tqt 

w 3 w 




! 93 


94 


95 






















5 


gcg 


aga 


ga ! 


l-02# 1 






















aga 


ate 


ace 


att acc agg 


qac 


aca 


tec 


gca 

3 w 3 


aqc 


aca 


qcc 


tac 


atq 

" **3 




gag 


ctg 


aqc 

3 


age ctg aga 


tct 


qaa 


qac 


acq 


qct 


qtq 

3 3 


tat 


tac 


tqt 




gcg 


aga 


ga ! 


l-03# 2 






















aga 


ate 


ace 


atg acc agg 


aac 


acc 


tec 


ata 


aqc 

3 w 


aca 


qcc 


tac 


ata 


10 


gaq 

3^3 


ctq 


aqc 


aqc ctq aqa 

3 3 3 


tct 


qaq 

3 3 


qac 


acq 


qcc 


qtq 

3 3 


tat 


tac 


tqt 




acq 

-3 3 


aga 


qq ! 


l-08# 3 






















aaa 


ate 


ace 


atg acc aca 


aac 


aca 


tec 


acg 


. aac 


aca 


gec 


tac 


atg 




qaq 


ctq 


aqq 


aqc ctq aqa 

3 3 3 


tct 


qac 


qac 


acq 

*** 3 


qcc 

3 


qtq 

3 3 


tat 


tac 


tqt 




gcg 


aga 


ga ! 


1-18# 4 




















15 


aaa 


ate 


ace 


atg acc gag 


qac 


aca 


tct 


aca 


aac 


aca 


acc 

3 


tac 


ata 

** W 3 




gag 


ctg 


age 


age ctg ago. 


tct 


gag 


gac 


acg 


gee 


gtg 


tat 


taC 


eg t. 




gca 


aca 


ga ! 


l-24# 5 






















aga 


gtc 


acc 


att acc agg 


gac 


agg 


tct 


atg 


age 


aca 


gee 


tac 


atg 




gag 


ctg 


age 


age ctg aga 


tct 


gag 


gac 


aca 


gee 


atg 


tat 


tac 


tgt 


20 


gca 


aga 


ta ! 


l-45# 6 






















aga 


gtc 


acc 


atg acc agg 


gac 


acg 


tec 


acg 


age 


aca 


gtc 


tac 


atg 




gag 


ctg 


age 


age ctg aga 


tct 


gag 


gac 


acg 


gee 


gtg 


tat 


tac 


tgt 




gcg 


aga 


ga ! 


l-46# 7 
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aga 


gtc 


acc 


gag 


ctg 


age 


gcg 


gca 


ga 


aga 


gtc 


a eg 


gag 


ctg 


age 


gcg 


aga 


ga 


aga 


gtc 


acg 


gag 


ctg 


age 


gcg 


aga 


ga 


aga 


gtc 


acc 


gag 


ctg 


age 


gca 


aca 


ga 



VH2 



agg 


etc 


acc 


aca 


atg 


acc 


gca 


cac 


aga 


agg 


etc 


acc 


acc 


atg 


acc 


gca 


egg 


ata 


agg 


etc 


acc 


aca 


atg 


acc 


gca 


egg 


ata 



VH3 



cga 


ttc 


acc 


caa 


atg 


aac 


gcg 


aga 


ga 


cga 


ttc 


acc 


caa 


atg 


aac 


gca 


aaa 


gat 


cga 


ttc 


acc 


caa 


atg 


aac 


gcg 


aga 


ga 


cga 


ttc 


acc 


caa 


atg 


aac 


gca 


aga 


ga 


aga 


ttc 


acc 


caa 


atg 


aac 


acc 


aca 


ga 


cga 


ttc 


acc 



att ace agg 
age ctg aga 
! l-58# 8 
att acc gcg 
age ctg aga 
! l-69# 9 
att acc gcg 
age ctg aga 
! l-e# 10 
ata acc gcg 
age ctg aga 
! l-f# 11 

ate acc aag 
aac atg gac 
c! 2-05# 12 
ate tec aag 
aac atg gac 
c! 2-26# 13 
ate tec aag 
aac atg gac 
c! 2-70# 14 

ate tec aga 
age ctg aga 

3-07# 15 
ate tec aga 
agt ctg aga 
a! 3-09#16 
ate tec agg 
age ctg aga 
! 3-ll# 17 
ate tec aga 
age ctg aga 
! 3-13# 18 
ate tea aga 
age ctg aaa 
! 3-15# 19 
ate tec aga 



gac atg tec 
tec gag gac 

gac gaa tec 
tct gag gac 

gac aaa tec 
tct gag gac 

gac acg tct 
tct gag gac 

gac acc tec 
cct gtg gac 

gac acc tec 
cct gtg gac 

gac acc tee 
cct gtg gac 

gac aac gee 
gec gag gac 

gac aac gee 
get gag gac 

gac aac gee 
gee gag gac 

gaa aat gec 
gee ggg gac 

gat gat tea 
acc gag gac 

gac aac gec 



aca age aca 
acg gec gtg 

acg age aca 
acg gec gtg 

acg age aca 
acg gec gtg 

aca gac aca 
acg gee gtg 

aaa aac cag 
aca gee aca 

aaa age cag 
aca gee aca 

aaa aac cag 
aca gec acg 

aag aac tea 
acg get gtg 

aag aac tec 
acg gee ttg 

aag aac tea 
acg gee gtg 

aag aac tec 
acg get gtg 

aaa aac acg 
aca gee gtg 

aag aac tec 



gee tac atg 
tat tac tgt 

gee tac atg 
tat tac tgt 

gee tac atg 
tat tac tgt 

gec tac atg 
tat tac tgt 



gtg gtc ctt 
tat tac tgt 

gtg gtc ctt 
tat tac tgt 

gtg gtc ctt 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ttg tat ctt 
tat tac tgt 

ctg tat ctg 
tat tac tgt 

ctg tat ctg 
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caa 


atg 


aac 


agt ctg aga 


gee 


gag 


gac 


acg 


gee 


ttg 


tat 


cac 


tgt 


gcg 


aga 


ga 


! 3-20# 20 




















cga 


ttc 


acc 


ate tec aga 


gac 


aac 


gee 


aag 


aac 


tea 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


gee 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga 


! 3-21# 21 




















egg 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


gee 


gag 


gac 


acg 


gee 


gta 


tat 


tac 


tgt 


gcg 


aaa 


ga 


! 3-23# 22 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


get 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aaa 


ga 


! 3-30# 23 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


get 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga 


! 3303# 24 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


get 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aaa 


ga 


! 3305# 25 




















cga 


ttc 


acc 


ate tee aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


gee 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga 


! 3-33# 26 




















cga 


ttc 


acc 


ate tec aga 


gac 


aac 


age 


aaa 


aac 


tec 


ctg 


tat 


ctg 


caa 


atg 


aac 


agt ctg aga 


act 


gag 


gac 


acc 


gee 


ttg 


tat 


tac 


tgt 


gca 


aaa 


gat 


a! 3-43#27 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


gee 


aag 


aac 


tea 


ctg 


tat 


ctg 


caa 


atg 


aac 


age ctg aga 


gac 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga 


! 3-48# 28 




















aga 


ttc 


acc 


ate tea aga 


gat 


ggt 


tec 


aaa 


age 


ate 


gee 


tat 


ctg 


caa 


atg 


aac 


age ctg aaa 


acc 


gag 


gac 


aca 


gec 


gtg 


tat 


tac 


tgt 


act 


aga 


ga 


! 3-49# 29 




















cga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctt 


caa 


atg 


aac 


age ctg aga 


gee 


gag 


gac 


acg 


gec 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga 


! 3-53# 30 




















aga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctt 


caa 


atg 


ggc 


age ctg aga 


get 


gag 


gac 


atg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga 


! 3-64# 31 




















aga 


ttc 


acc 


ate tec aga 


gac 


aat 


tec 


aag 


aac 


acg 


ctg 


tat 


ctt 


caa 


atg 


aac 


age ctg aga 


get 


gag 


gac 


acg 


get 


gtg 


tat 


tac 


tgt 


gcg 


aga 


ga ! 3-66# 32 




















aga 


ttc 


acc 


ate tea aga 


gat 


gat 


tea 


aag 


aac 


tea 


ctg 


tat 


ctg 
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caa 


atg 


aac 


age ctg aaa 


acc 


get 


aga 


ga 


! 3-72# 33 




agg 


ttc 


acc 


ate tec aga 


gat 


caa 


atg 


aac 


age ctg aaa 


acc 


act 


aga 


ca 


! 3-73# 34 




cga 


ttc 


acc 


ate tec aga 


gac 


caa 


atg 


aac 


agt ctg aga 


gec 


gca 


aga 


ga 


! 3-74# 35 




aga 


ttc 


acc 


ate tec aga 


gac 


caa 


atg 


aac 


age ctg aga 


get 


aag 


aaa 


ga 


! 3-ci# 36 




/H4 










cga 


gtc 


acc 


ata tea gta 


gac 


aag 


ctg 


age 


tct gtg acc 


gee 


gcg 


aga 


ga 


! 4-04# 37 




cga 


gtc 


acc 


atg tea gta 


gac 


aag 


ctg 


age 


tct gtg acc 


gee 


gcg 


aga 


aa 


! 4-28# 38 




cga 


gtt 


acc 


ata tea gta gac 


aag 


ctg 


age 


tct gtg act 


gec 


gcg 


aga 


ga 


! 4301# 39 




cga 


gtc 


acc 


ata tea gta 


gac 


aag 


ctg 


age 


tct gtg acc 


gee 


gec 


aga 


ga 


! 4302# 40 




cga 


gtt 


acc 


ata tea gta 


gac 


aag 


ctg 


age 


tct gtg act 


gec 


gec 


aga 


ga ! 4304# 41 




cga 


gtt 


acc 


ata tea gta 


gac 


aag 


ctg 


age 


tct gtg act 


gec 


gcg 


aga 


ga ! 


! 4-31# 42 




cga 


gtc 


acc 


ata tea gta 


gac 


aag 


ctg 


age 


tct gtg acc 


gee 


gcg 


aga 


ga ! 


4-34# 43 




cga 


gtc 


acc 


ata tec gta 


gac 


aag 


ctg 


age 


tct gtg acc 


gec 


gcg 


aga 


ca ! 


4-39# 44 




cga 


gtc 


acc 


ata tea gta 


gac 


aag 


ctg 


age 


tct gtg acc 


get 


gcg 


aga 


ga ! 


4-59# 45 





gag gac acg gec gtg tat tac tgt 

gat tea aag aac acg gcg tat ctg 
gag gac acg gee gtg tat tac tgt 

aac gec aag aac acg ctg tat ctg 
gag gac acg get gtg tat tac tgt 

aat tec aag aac acg ctg cat ctt 
gag gac acg get gtg tat tac tgt 

aag tec aag aac cag ttc tec ctg 
gcg gac acg gec gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gtg gac acg gee gtg tat tac tgt 

acg tct aag aac cag ttc tec ctg 
gcg gac acg gee gtg tat tac tgt 

agg tec aag aac cag ttc tec ctg 
gcg gac. acg gec gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gca gac acg gee gtg tat tac tgt 

acg tct aag aac cag ttc tec ctg 
gcg gac acg gec gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gcg gac acg get gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gca gac acg get gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gcg gac acg gec gtg tat tac tgt 
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cga gtc acc ata tea gta gac 
aag ctg age tct gtg acc get 
gcg aga ga ! 4-61# 46 
cga gtc acc ata tea gta gac 
aag ctg age tct gtg acc gee 
gcg aga ga ! 4-b# 47 
! VH5 

cag gtc acc ate tea gec gac 
cag tgg age age ctg aag gee 
gcg aga ca ! 5-51# 48 
cac gtc acc ate tea get gac 
cag tgg age age ctg aag gee 
gcg aga ! 5-a# 49 
! VH6 

cga ata acc ate aac cca gac 
cag ctg aac tct gtg act ccc 
gca aga ga ! 6-l# 50 
! VH7 

egg ttt gtc ttc tec ttg gac 
cag ate tgc age eta aag get 
gcg aga ga ! 74. 1# 51 



acg tec aag aac cag ttc tec ctg 
gcg gac acg gee gtg tat tac tgt 

acg tec aag aac cag ttc tec ctg 
gca gac acg gee gtg tat tac tgt 

aag tec ate age acc gee tac ctg 
teg gac acc gec atg tat tac tgt 

aag tec ate age act gee tac ctg 
teg gac acc gee atg tat tac tgt 

aca tec aag aac cag ttc tec ctg 
gag gac acg get gtg tat tac tgt 

acc tct gtc age acg gca tat ctg 
gag gac act gec gtg tat tac tgt 
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J 



Table 250: REdaptors, Extenders, and Bridges used for Cleavage and Capture of 
Human Heavy Chains in FR3. 

A: HpyCH4V Probes of actual human HC genes 

!HpyCH4V in FR3 of human HC, bases 35-56; only those with TGca site 
TGca; 10, 

RE recognition: tgca of length 4 is expected at 10 

1 6-1 agttctccctgcagctgaacto 
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10 



15 



20 



2 3-11, 3-07,3-21, 3-72,3-48 cactgtatctgcaaatgaacag 

3 3-09, 3-43, 3-20 ccctgtatctgcaaatgaacag 

4 5-51 ccgcctacctgcagtggagcag 

5 3-15, 3-30, 3-30 .5,3-30 . 3, 3-74 , 3-23, 3-33 cgctgtatctgcaaatgaacag 

6 7-4-1 cggcatatctgcagatctgcag 

7 3-73 cggcgtatctgcaaatgaacag 

8 5-a ctgcctacctgcagtggagcag 

9 3-49 tcgcctatctgcaaatgaacag 
B: HpyCH4V RE dap tors, Extenders, and Bridges 

B.1 REdaptors 

! Cutting HC lower strand: 

! TmKeller for 100 mM NaCl, zero formamide 
! Edapters for cleavage 



(ON_HCFR36-l) 
(0N_HCFR36-1A) 
(0N_HCFR36-1B) - 
(ON_HCFR33-15) 
(ON_HCFR33-15A) 
(ONJHCFR33-15B) 
(ON_HCFR33-ll) 
(ON HCFR35-51) 



5 1 -agttctcccTGCAgctgaactc-3 1 
5 ' -ttctcccTGCAgctgaactc-3 1 
5 * -ttctcccTGCAgctgaac-3 1 

5 ' -cgctgtatcTGCAaatgaacag-3 ' 
5 1 -ctgtatcTGCAaatgaacag-3 ■ 
5 1 -ctgtatcTGCAaatgaac-3 • 

5 ' -cactgtatcTGCAaatgaacag-3 1 

5 • -ccgcctaccTGCAgtggagcag-3 1 



68.0 
62.0 
56.0 
64.0 
56.0 
50.0 
62.0 
74.0 



m K 

64.5 
62.5 
59.9 
60.8 
56.3 
53.1 
58.9 
70.1 



25 



30 



35 



BJ2 Segment of synthetic 3-23 gene into which captured CDR3 is to be cloned 

Xbal ... 

D323* cgCttcacTaag tcT aaa gac aaC tcT aag aaT acT etc taC 
scab designed gene 3-23 gene 

HpyCH4V 

AflXI... 

Ttg caG atg aac a ge TtA ag G . . . 



B3 Extender and Bridges 

! Extender (bottom strand) : 
! 

(ONJiCHpyExOl) 5 ' - cAAgTAgAgAgTATTcTTAgAgTTgTcTcTAgAcTTAgTgAAgcg- 3 ' 
! ON_HCRpyEx01 is the reverse complement of 

! 5' -cgCttcacTaag tcT aga gac aaC tcT aag aaT acT etc taC Ttg -3' 



40 



! Bridges (top strand, 9-base overlap) 
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(ON_HCHpyBr016-l) 5 • -cgCttcacTaag tcT aaa gac aaC tcT aag- 

aaT acT etc taC Ttg CAgctgaac-3 ■ {3»-term C is blocked) 

i 

J ! 3-15 et al. + 3-11 

(ON_HCHpyBr023-15) 5 ■ -cgCttcacTaag tcT aaa gac aaC tcT aag- 

aaT acT etc taC Ttg CAaatgaac-3' {3' -term C is blocked} 

! 

! 5-51 

10 (ON_HCHpyBr045-51) 5 1 -cgCttcacTaag tcT aaa gac aaC tcT aag- 

aaT acT etc taC Ttg CAgtggagc-3' {3 '-term C is blocked} 

! 

! PCR primer (top strand) 
i 

15 (ONJiCHpyPCR) 5 f -cgCttcacTaag tcT aaa gac-3' 



C: BlpX Probes from human HC GLGs 



20 



25 



30 



1 


1-58,1-03, 


1-08, 


1-69, 


1-24, 1-45, 1-4 6, 1-f , 1-e 


acatggaGCTGAGCagcctgag 


2 








1-02 


acatggaGCTGAGCaggctgag 


3 








1-18 


acatggagctgaggagcctgag 


4 








5-51, 5-a 


acctgcagtggagcagcctgaa 


5 








3-15,3-73,3-49,3-72 


atctgcaaatgaacagcctgaa 


6 


3303,3-33,3- 


07,3- 


11,3- 


30,3-21,3-23,3305,3-48 


atctgcaaatgaacagcctgag 


7 








3-20,3-74,3-09,3-43 


atctgcaaatgaacagtctgag 


8 








74.1 


atctgcagatctgcagcctaaa 


9 








3-66, 3-13, 3-53, 3-d 


atcttcaaatgaacagcctgag 


10 








3-64 


atcttcaaatgggcagcctgag 


11 


4301,4-28,4302,4-04, 


4304, 


4-31, 


4-34, 4-39, 4-59, 4-61, 4-b 


ccctgaaGCTGAGCtctgtgac 


12 








6-1 


ccctgcagctgaactctgtgac 


13 








2-70,2-05 


tccttacaatgaccaacatgga 


14 








2-26 


tccttaccatgaccaacatgga 



D: BlpX REdaptors, Extenders, and Bridges ~ 
35 D.l REdaptors 

T m w T n K 

<BlpF3HCl-58) 5'-ac atg gaG CTO AGO age ctg ag-3 • 70 66.4 

(BlpF3HC6-l) 5'-cc ctg aag ctg age tct gtg ac-3 • 70 66.4 

! BlpF3HC6-l matches 4-30.1, not 6-1. 

40 

D.2 Segment of synthetic 3-23 gene into which captured CDR3 is to be cloned 
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! BlpI 

Xtoal ••• ••••••• 

!D323* cgCttcacTaag TCT AGA gac aaC tcT aag aaT acT ctC taC Ttg caG atg aac 

! Aflll... 
! aa C TTA AG G 

D3 Extender and Bridges 

! Bridges 

(BlpF3Brl) 5' -cgCttcacTcag tcT aga gaT aaC AGT aaA aaT acT TtG- 

taC Ttg caG Ctg a|GC age ctg-3 1 
(BlpF3Br2) 5 1 -cgCttcacTcag tcT aga gaT aaC AGT aaA aaT acT TtG— 

taC Ttg caG Ctg a | gc tct gtg-3 1 
! I lower strand is cut here 

! Extender 
(BipF3Ext) 5*- 

TcAgcTgcAAgTAcAAAgTATTTTTAcTgTTATc TcTAgA cTgAgTgAAgo^ - 3 1 
! BlpF3Ext is the reverse complement of: 

! 5* -cgCttcacTcag tcT aga gaT aaC AGT aaA aaT acT TtG taC Ttg caG Ctg a-3' 
(BlpF3PCR) 5* -cgCttcacTcag tcT aga gaT aaC-3* 



E: HpyCH4m Distinct GLG sequences surrounding site, bases 77-98 



1 


10201, 11804, 14607, 16909, le01O, 311017, 353*30,404 #37, 4301 


ccgtgtattactgtgcgagaga 


2 


10302,307 015,321021,3303024,333026,348028,364 031,366032 


ctgtgtattactgtgcgagaga 


3 


10803 


ccgtgtatta ctgtgcgagagg 


4 


124#5,lf#ll 


ccgtgtattactgtgcaacaga 


5 


14506 


ccatgtattactgtgcaagata 


6 


15808 


ccgtgtattactgtgcggcaga 


7 


205012 


ccacatattactgtgcacacag 


8 


226#13 


ccacatattactgtgcacggat 


9 


270014 


ccacgtattactgtgcacggat 


10 


309016,343027 


ccttgtattactgtgcaaaaga 


11 


313018, 374*35, 61#50 


ctgtgtattactgtgcaagaga 


12 


315019 


ccgtgtattactgtaccacaga 


13 


320020 


ccttgtatcactgtgcgagaga 


14 


323022 


ccgtatattactgtgcgaaaga 


15 


330023,3305025 


ctgtgtattactgtgcgaaaga 


16 


349029 


ccgtgtattactgtactagaga 


17 


372033 


ccgtgtattactgtgctagaga 


18 


373034 


ccgtgtattactgtactagaca 


19 


3d#36 


ctgtgtattactgtaagaaaga 


20 


428038 


ccgtgtattactgtgcgagaaa 


21 


4302040,4304041 


ccgtgtattactgtgccagaga 


22 


439044 


ctgtgtattactgtgcgagaca 


23 


551048 


ccatgtattactgrtgcgagaca 
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24 5a#4 9 ccatgtattactgtgcgaga 

F: flpyCH4III REdaptors, Extenders, and Bridges 
F.1 REdaptors 

! ONs for cleavage of HC(lower) in FR3 (bases 77-97) 
! For cleavage with HpyCH4III, Bst4CI, or Taal 
! cleavage is in lower chain before base 88- 



1 






77 


788 


888 


888 


889 


999 


999 


9 








1 






78 


901 


234 


567 


890 


123 


456 


7 




± m 


rp K 


(H43.77.97 


.1-02#1) 


5' 


-cc 


gtg 


tat 


tAC 


TGT 


gcg 


aga 


g- 


3' 


64 


62.6 


(H43.77.97 


.l-03#2) 


5' 


-<* 


gtg 


tat 


tAC 


TGT 


gcg 


aga 


g- 


3' 


62 


60.6 


(H43.77.97 


.108#3) 


5» 


-cc 


gtg 


tat 


tAC 


TGT 


gcg 


aga 


g- 


3' 


64 


62.6 


(H43.77.97 


.323#22) 


5' 


-cc 


gtl tat 


tac 


tgt 


gcg 


afa 


g- 


3' 


60 


58.7 


(H43.77.97 


.330*23) 


5' 




gtg 


tat 


tac 


tgt 


gcg 


a|a 


g- 


3' 


60 


58.7 


<H43.77.97 


.439#44) 


5' 


-<f 


gtg 


tat 


tac 


tgt 


gcg 


aga 


t 


3' 


62 


60.6 


(H43.77.97 


.551#48) 


5' 


-cc 


|tg 


tat 


tac 


tgt 


gcg 


aga 




3" 


62 


60.6 


(H43.77.97 


.5a#49) 


5' 


-cc 


Itg 


tat 


tAC 


TGT 


gcg 


aga 


i 


3' 


58 


58.3 



F.2 Extender and Bridges 

! Xbal and Aflll sites in bridges are bunged 

(H43.XABrl) 5 • -ggtgtagtga- 

I TCT | AGt | gac | aac | tct | aag I aa 1 1 act I etc I tac I ttg I cag I atg | - 

1 aac I acrC t TTt I AGa I get I gag I crao > aCT I GCAI Gtc I tac 1 tat tgt gcg aga-3' 

(H43 . XABr2 ) 5 ' -ggtgtagtga- 

| TCT | AGt 1 gac | aac | tct | aag I aat 1 act | etc | tac | ttg}cag|atg| - 

I aac I agC t TTt I AGa I act I aaa I aac 1 aCT 1 GCA I Gtc I taci tat tgt gcg aaa-3' 

(H43.XAExt) 5 ' -ATAgTAgAcT gcAgTgTccT cAgcccTTAA gcTgTTcATc TgcAAgTAgA- 
gAgTATT cTT AgAgTTgTcT cTAgAT cAcT AcAcc-3 , 

!H43.XAExt is the reverse cori?>lenient of 

! 5 1 -ggtgtagtga- 

! | TCT | AGA | gac | aac | tct | aag | aat I act I etc t tac I ttg | cag | atg | - 
t I aac t aoC 1 TTA I AGa I act I aaa I aac I aCT I GCA IGtcltacltat -3' 

(H43.XAPCR) 5 1 -ggtgtagtga | TCT I AGA | gac | aac-3 1 
! Xbal and Aflll sites in bridges are bunged 
(H43.ABrl) 5 • -ggtgtagtga- 

laacl aaCITTtlAGalactlaaalaaclaCTIGCAIGtct tad tat tgt gcg aga-3 f 
(H43.ABr2) 5 ■ -ggtgtagtga- 

1 aac | aoC I TTt I AGg I get 1 aaa I aac I aCT I GCA I Gtc I tac I tat tgt gcg aaa-3» 
(H4 3 . AExt ) 5 ' -ATAgTAgAcTg cAgTgT ccT cAgc ccTTAAgcTgTTTcAcTAcAcc - 3 1 
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! (H43.AExt) is the reverse complement of 5 • -ggtgtagtga- 
! 1 aac 1 aoC I TTA I AGq \ get I craqtqacl aCT I GCAI Gtc I tac I tat -3' 
(H43.APCR) 5 » -ggtgtagtga I aac I aoC I TTA I AGq 1 get I ct-3 ' 
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lHto|Bz>g|-efe(BHS|Bo ^|^CT|B6B| < eoo|TOO|Hfi6|o^6 |eTO]egJ5|TOJ \boo\ 



Sfr frfr Gfr 3* Tfr Ofr 6G 8G Z.C 9G SC frC G€ 3E IG i 
I>LI , 

i 

I I«JW l i 

I pod | pfip I oq.o | qpp | obe I q.q.6 | peo | qq.o j 

0G 63 83 LZ 93 S3 *3 G3 I 
(eZ-CA/LfrdCl) IHJ i 

IOON j 

. ■ ■ • ikoBn j 

IT?S qPOS j 

fifio opq. fifio ofifi oq.fi fifio ofi qq.o pfip op£- , £ 
63 oofi j&ve ODD «po JBeo 009 qo fbd q.oq. i>^o- , 9 

v w v a o y i 

33 13 03 6X 81 LI i 



tittoqs suopoo paqpfiaxaPA x&j*i sjjowarapjj HA G3-GA :Q09 ^TcpM* 
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Sites to be varied > *** *** *** 

FR1 > | CDRl | FR2 

46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
AS GFTFSSYAMSWVR 
IcrctlTCClGGAI ttclactlttcl tctl tCGITACIGct latql toti t^lottlcoCl 143 
I cga | agg | cct | aag | tga | aag I aga | age I a t g I cga 1 1 ac 1 aga I acc I caa | geg | 

I BspEI I | BsiWIi |BstXI. 

Sites to be varies > *** *** *** 

FR2 >| . . . CDR2 

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
QAPGKGLEWVSAI SG 
ICAal get I ccT I GG t I aaa I crcrt I ttgjgagj tgglgtt I tot I got I ate I tot | ggt I 188 
lgtt|cga|gga|cca|ttt|cca|aae|cte|acc|eaa|aga| cga I tag) aga I cca| 
. ..BstXI | 



*** *** 

CDR2 | FR3 

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
SGGSTYYADSVKGRF 
Itctlqatlq^laqtlactltaclt atlgctlgaolteclattlaaalQ^r tlcacI ttel 233 
I aga | cca | ccg 1 1 ca | tga 1 a tg | a ta | aga | ctg | agg | caa | ttt | cca | gcg | aag | 

FR3 

91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
.T I S RDNS KN T L Y LQM 
I act I at c | TCT | AGA | gac | aac | tct laagl aat | act j etc 1 1 ac 1 1 tg | cag | a tg | 278 
|tga|tag|aga|tct|ctg| ttg|aga| tto| tta| tga|gag|atg| aaclgtc | tac | 
I Xbal | 

FR3 >| 

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
NS LRAEDTAVYYCAK 
laaclaot:iTTaiAGq| qct|qaqlraclaCTl G 323 
I ttg | teg | aat | tec | cga | etc | ctg | tga | cgt | cagl atg| ata I acgl cga | ttt | 
lAflll | | PstI | 



CDR3 | FR4 

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
DYEGTGYAFDIWGQG 
lgac| tat|gaa|ggt|act|ggt| tot I agtjjaoiflaCl^ailggdagSlsa a I ggt | 368 
I ctg | ata | c 1 1 1 cca | tga | cca | ata I cga | aag | ctg | tat | acc | cca | gtt | cca | 

I Ndel | 

FR4 >| 

136 137 138 139 140 141 142 
T M V T V S S 
I act | atG | GTC | ACC | gtc | tct | agt- 389 
I tga | tac | cag | tgg | cag | aga | tca- 
I BstBII | 

143 144 145 146 147 148 149 150 151 152 

ASTKGPSVFP 
gec tec acc aaG GGC CCa teg GTC TTC ccc-3' 419 
egg agg t gq ttc cca cat acre cag aac gag- 5 ' 
Bspl20l. BbsI...(2/2) 
Apal .... 



(SFPRMET) 5 '-ctg tct gaa cG GCC cag ccG-3 1 
(TOPFR1A) 5 '-ctg tct gaa cG GCC cag ccG GCC atg gec- 
gaa I gtt | CAA I TTG) tta I gag | tct I ggt I - 

I ggc I ggt ] ctt J gtt | cag I cc 1 1 ggt | ggt | tct | tta-3 ' 
(BOTFR1B) 3 1 -caa | gtc | gga | cca | cca | aga | aat | gca | gaa | aga | acg | cga | - 

| cga | agg | cct | aag | tga | aag- 5 ' ! bottom strand 
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(BOTFR2) 3»-acc|caa|gcgl- 

| gtt | cga | gga | cca | ttt | cca | aac | etc | acc | caa | aga | - 5 ' ! bottom strand 
(BOTFR3) 3'- a|cga|ctg|agg|caa|ttt|cca|gcg|aagl- 

| tga | tag I aga | tct | ctg | ttg I aga | ttc | tta | tga | gag | atg | aac | gtc | Uc | - 
5 I ttg; I teg I aat | tec | cga | etc | ctg | tga.- 5 1 

(F06) S»-gC|TTA|AGg|gctlgag|gac|aCT|GCA|Gtc|tac|tat|tgc|gct|aaa|- 

I gac 1 tat I gaa I ggt I act I ggt | tat I get I ttc I gaC ) ATA I TGg I ggt I c-3 ■ 
(BOTFR4) 3'-cga|aag|ctg|tat|acc|cca|gtt|cca|- 
I tga| tacjcagl tggfcaglagal tca- 
10 egg agg tgg ttc ccg; ggt age cag aag ggg-S' ! bottom strand 

(BOTPRCPRIM) 3'-gg ttc ccg; gg-t age cag; aag; ggg-5' 

! CDR1 diversity 
! 

15 (ON-vgCl) 5 , ~ |qct:ITCCtGGaHttciaotlttcltCtl<l>ITACl<l>iatal<l>l~ 

| CDR1 ....•....••>••••••• 6859 

|tqq|attlcqClCAalactlccTIQG -3' 

!<1> stands for an equimolar mix of {ADEFGHIKLMNPQRSTVWY} ; no C 
20 ! (this is not a sequence) 

! CDR2 diversity 

! 

(ON-vgC2) . 5'-ggt| ttgl gag! tgg | gtt | tct |<2> | ate | <2> |<3> I - 

25 ! CDR2 

I tct | ggt I ggc |<1> | act I <1> I tat I get | gac | tec I gtt I aaa I gg-3 1 

! CDR2 

! <1> is an equimolar mixture of {ADEFGHIKLMNPQRSTVWY}; no C 

! <2> is an equimolar mixture of {YRWVGS}; no ACDEFHIKLMNPQT 

30 l <3> is an equimolar mixture of {PS}; no ACDEFGHIKLMNQRTVWY 
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Table 800 (new) 

The following list of enzymes was taken from 
httr>: / /rebase , neb . com/cai-bin/as vmmlist . 

I have removed the enzymes that a) cut within the recognition, b) 
cut on both sides of the recognition, or c) have fewer than 2 
bases between recognition and closest cut site. 

REBASE Enzymes 
04/13/2001 



Isoschizomers 



Bful 
Bmrl 



BspLUllIII 
Acc36I 



Suppliers 

y 



Type II restriction enzymes with asymmetric recognition 
sequences : 

Enzymes Recognition Sequence 

Aarl GACCTGCNNNN A NNNN_ 

Acelll CAGCTCNNNNNNN /N NNNN_ 

Bbr 7 1 GAAGLACNNNNNNN A NNNN_ 

Bbvl GCAGCNNNNNNNN A NNNN_ 

BbvII GAAGACNN A NNNN_ 

Bce83I CTTGAGNNNNNNNNNNNNNN_NN A - 

BceAI ACGGCNNNNNNNNNNNN^NN^ - 

Bee f I AC GGCNNNNNNNNNNNN A N_ 

BciVI GTATCCNNNNN_N A 

Bfil ACTGGGNNNN_N A 

BinI GGATCNNNN A N_ 

BscAI GCATCNNNN^NN^ 

BseRI GAGGAGNNNNNNNN__NN A 

BsmFI GGGACNNNNNNNNNN A NNNN 

BspMI ACCTGCNNNN A NNNN_ 

Ecil GGCGGANNNNNNNNN_NN A 

Eco57I CTGAAGNNNNNNNNNNNNNN_NN A BspKTSI 

Faul CCCGCNNNN A NN_ BstFZ438I 

Fokl GGAT GNNNNNNNNN A NNNN_ BstPZ418I 

Gsul CTGGAGNNNNNNNNNNNNNN_NN A 

Hgal GACGCNNNNN A NNNNN_ 

HphI GGTGANNNNNNNJKT AsuHPI 

MboII GAAGANNNNNNN_N A 

Mlyl GAGTCNNNNN A SchI 

Mmel TCCRACNNNNNNNNNNNNNNNNNN__NN A 

Mnll CCTCNNNNNN_N A 

Plel GAGTCNNNN A N_ PpsI 

RleAI CCCACANNNNNNNNN_NNN A 

SfaNI GCATCNNNNN A NNNN_ BspSTSI 

SspDSI GGTGANNNNNNNN A 

Sthl32I CCCGNNNN A NNNN_ 

StsI GGATGNNNNNNNNNN A NNNN_ - 

TaqI I GACCGANNNNNNNNN_NN A , CACCCANNNNNNNNN__NN A 

Tthlllll CAARCANNNNNNNNN_NN A 

UbaPI CGAACG 



y 
y 



y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 
y 

y 
y 
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The notation is A means cut the upper strand and _ means cut the 
lower strand. If the upper and lower strand are cut at the same 
place, then only A appears. 
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Table 120: MALIA3, annotated 
! MALIA3 9532 bases 







1 


aat 


get 


act 


act 


att 


aat 


aaa 


att 


aat 


acc 


acc 


ttt 


tea 


act 


cgc 


gee 


5 


i 


gene ii continued 






























49 


cca 


aat 


gaa 


aat 


ata 


act 


aaa 


caa 


att 


att 


aac 


cat 


ttg 


cga 


aat 


gta 






97 


tct 


aat 


ggt 


caa 


act 


aaa 


tct 


act 


cgt 


teg 


cag 


aat 


too 


aaa 


tea jact 






145 


gtt 


aca 


tgg 


aat 


gaa 


act 


tec 


aga 


cac 


cgt 


act 


tta 


att 


aca 


tat 


tta 






193 


aaa 


cat 


gtt 


gag 


eta 


cag 


cac 


cag 


att 


cag 


caa 


tta 


aac 


tct 


aag 


cca 


10 




241 


tec 


gca 


aaa 


atg 


acc 


tct 


tat 


caa 


aag 


gag 


caa 


tta 


aaa 


ata 


etc 


tct 






289 


aat 


cct 


gac 


ctg 


tta 


gag 


ttt 


get 


tec 


ggt 


ctg 


gtt 


cac 


ttt 


gaa 


get 






337 


cga 


att 


aaa 


aeg 


caa 


tat 


ttg 


aag 


tct 


ttc 


ggg 


ctt 


cct 


ctt 




Cut 






385 


ttt 


gat 


gca 


ate 


cac 


ttt 


get 


tct 


gac 


tat 


aat 


agt 


caa 


aat 


aaa 


gac 






a *a *a 


ctg 


att 


ttt 


gat 


tta 


tgg 


tea 


ttc 


teg 


ttt 


tct 


gaa 


ctg 


ttt 


aaa 


gca 






481 


ttt 


gag 


ggg 


gat 


tea 


ATG 


aat 


att 


tat 


gac 


gat 


tec 


gca 


gta 


ttg 


gac 




! 












Start gene x, ii continues 














529 


get 


ate 


cag 


tct 


aaa 


cat 


ttt 


act 


att 


acc 


ccc 


tct 


ggc 


aaa 


act 


tct 






577 


ttt 


gca 


aaa 


gee 


tct 


cgc 


tat 


ttt 


ggt 


ttt 


tat 


cgt 


cgt 


ctg 


gta 


aac 






625 


gag 


ggt 


tat 


gat 


agt 


gtt 


get 


ctt 


act 


atg 


cct 


cgt 


aat 


tec 


4-4-4- 


tgg 


20 




673 


cgt 


tat 


gta 


tct 


rt r* 


tta 


gtt 


gaa 


tgt 


ggt 


att 


cct 


aaa 


tct 


caa 


ctg 






721 


atg 


aat 


ctt 


tct 


acc 


tgt 


aat 


aat 


gtt 


gtt 


ccg 


tta 


y l u 


cgt 


ttt 


att 






769 


aac 


gta 


gat 


ttt 


tct 


tec 


caa 


cgt 


cct 


gac 


tgg 


tat 


aat 




cca 


—4- 4. 

gtc 






817 


ctt 


aaa 


ate 


gca 


TAA 


























i 












End 


X & 


II 




















25 


i 


832 


ggtaattca ca 




























i 




Ml 








E5 










Q10 










T15 








843 


ATG 


att 


aaa 


gtt 




att 


aaa 


cca 


tct 


caa 


gee 


caa 


ttt 


act 


act 


cgt 








Start gene V 


























30 












































S17 






S20 










P25 










E30 








! 


891 


tct 


ggt 


gtt 


tct 


cgt 


cag 


ggc 


aag 


cct 


tat 


tea 


ctg 


aat 


gag 


cag 


ctt 




! 








V35 










E40 










V45 








35 


t 


939 


tgt 


tac 


gtt 


gat 


ttg 


ggt 


aat 


gaa 


tat 


ccg 


gtt 


ctt 


gtc 


aag 


att 


act 




1 






D50 










A55 










L60 














987 


ctt 


gat 


gaa 


ggt 


cag 


cca 


gee 


tat 


gcg 


cct 


ggt 


cTG 


TAC 


Acc 


gtt 


cat 



BsrGI . . . 
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! L65 V70 S75 R80 

1035 ctg tec tct ttc aaa gtt ggt cag ttc ggt tec ctt atg att gac cgt 

i 

! P85 K87 end of V 

J 1083 ctg cgc etc gtt ccg get aag TAA C 

! 

1108 ATG gag cag gtc gcg gat ttc gac aca att tat cag gcg atg 
! Start gene VII 

! 

10 1150 ata caa ate tec gtt gta ctt tgt ttc gcg ctt ggt ata ate 

! 

! VII and IX overlap. 

! S2 V3 L4 V5 S10 

1192 get ggg ggt caa agA TGA gt gtt tta gtg tat tct ttc gee tct ttc gtt 
15 ! End VII 

! (start IX 

! L13 W15 G20 T25 E29 

1242 tta ggt tgg tgc ctt cgt agt ggc att acg tat ttt acc cgt tta atg gaa 

i 

20 1293 act tec tc 

j 

! .... stop of IX, IX and VIII overlap by four bases 

1301 ATG aaa aag tct tta gtc etc aaa gee tct gta gee gtt get acc etc 
! Start signal sequence of viii. 

25 ! 

1349 gtt ccg atg ctg tct ttc get get gag ggt gac gat ccc gca aaa gcg 
! mature VIII > 

1397 gee ttt aac tec ctg caa gee tea gcg acc gaa tat ate ggt tat gcg 

1445 tgg gcg atg gtt gtt gtc att 
30 1466 gtc ggc gca act ate ggt ate aag ctg ttt aag 

1499 aaa ttc acc teg aaa gca f 1515 

! -35 

i 

1517 age tga taaaccgat acaattaaag gctccttttg 

35 ! -10 ... 

! 

1552 gagecttttt ttttGGAGAt ttt ! S.D. underlined 

! 

! < in signal sequence > 
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! MKKLLFAI PLV 

1575 caac GTG aaa aaa tta tta ttc gca att cct tta gtt ! 1611 

i 

! VPFYSHSAQ 
5 1612 gtt cct ttc tat tct cac aGT gcA Cag tCT 

! ApaLI . • . 
! 

1642 GTC GTG ACG CAG CCG CCC TCA GTG TCT GGG GCC CCA* GGG CAG 

AGG GTC ACC ATC TCC TGC ACT GGG AGC AGC TCC AAC ATC GGG GCA 

10 ! BstEII... 

1729 GGT TAT GAT GTA CAC TGG TAC CAG CAG CTT CCA GGA ACA GCC CCC AAA 

1777 CTC CTC ATC TAT GGT AAC AGC AAT CGG CCC TCA GGG GTC CCT GAC CGA 

1825 TTC TCT GGC TCC AAG TCT GGC ACC TCA GCC TCC CTG GCC ATC ACT 

1870 GGG CTC CAG GCT GAG GAT GAG GCT GAT TAT 

15 1900 TAC TGC CAG TCC TAT GAC AGC AGC CTG AGT 

1930 GGC CTT TAT GTC TTC GGA ACT GGG ACC AAG GTC ACC GTC 

! BstEII... 

1969 CTA GGT CAG CCC AAG GCC AAC CCC ACT GTC ACT 

2002 CTG TTC CCG CCC TCC TCT GAG GAG CTC CAA GCC AAC AAG GCC ACA CTA 

20 2050 GTG TGT CTG ATC AGT GAC TTC TAC CCG GGA GCT GTG ACA GTG GCC TGG 

2098 AAG GCA GAT AGC AGC CCC GTC AAG GCG GGA GTG GAG ACC ACC ACA CCC 

2146 TCC AAA CAA AGC AAC AAC AAG TAC GCG GCC AGC AGC TAT CTG AGC CTG 

2194 ACG CCT GAG CAG TGG AAG TCC CAC AGA AGC TAC AGC TGC CAG GTC ACG 

2242 CAT GAA GGG AGC ACC GTG GAG AAG ACA GTG GCC CCT ACA GAA TGT TCA 

25 2290 TAA TAA ACCG CCTCCACC GG GCGCGCCA AT TCTATTTCAA GGAGACAGTC ATA 

! AscI 

i 

! PelB signal > 

! MKYLLPTAAAGLLLL 

30 2343 ATG AAA TAC CTA TTG CCT ACG GCA GCC GCT GGA TTG TTA TTA CTC 
! 

! 16 17 18 19 20 21 22 

! A A Q P A MA 

2388 gcG GCC cag ccG GCC atq a ce 

35 ! sfil 

! NgoMI. . . (1/2) 

! Ncol 

! 
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2409 



FR1 (DP47/V3-23) 

23 24 25 26 27 28 29 30 
EVQLLESG 
gaa | gtt | CAA| TTG | tta | gag I tct I ggt I 
I Mfel | 



2433 



FR1 

31 32 33 34 35 36 37 38 39 40 41 42 4*3 44 45 
GGLVQPGGSLRLSCA 
| ggc | ggt | ctt | gtt | cag | cct | ggt | ggt | tct | tta | cgt | ctt | tct | tgc | get | 



2478 



FR1 ->| . . .CDR1 | FR2 

46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
ASGFTFS SYAMSWVR 
| get | TCC | GGA | ttc | act I ttc | tct I tCG | TAC | Get | atg | tct | tgg I gtt | cgC | 

1 BspEI | | BsiWII IBstXI. 



2523 



— — FR2 ~~ — — — — - *~ m ~ | • • • CDR2 • •••••••• 

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
QAPGKGLEWVSAISG 
| CAa | get | ccT | GGt | aaa | ggt | ttg | gag | tgg | gtt | tct | get | ate | tct | ggt | 
.BstXI | 



2568 



2613 



.... CDR2 | FR3 

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
S GGS TYYADSVKGRF 
| tct | ggt | ggc | agt | act | tac | tat | get | gac | tec | gtt | aaa | ggt | cgc | ttc | 



FR3 

91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
T ISRDNSKNTLYLQM 

I act | ate | TCT | AGA| gac | aac | tct | aag | aat | act | etc | tac | ttg | cag | atg I 
I Xbal | 



2658 



FR3 >l 

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
N SLRAEDTAVYYCAK 
I aac | agC I TTA I AGg I get | gag | gac I aCT I GCA | Gtc I tac | tat | tgc I get | aaa I 
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jAflll | 



I PstI 



2703 



••••••• CDR.3 •••••••••••••••••] — — — — FR4 ™ 

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
DYEGTGYAFDIWGQG 
I gac | tat | gaa | ggt | act | ggt | tat | get | ttc | gaC | ATA| TGg I ggt | caa | ggt | 

I Ndel | (1/4) 



FR4 > j 

10 ! 136 137 138 139 140 141 142 

T M V T V S S 
2748 ) act | atG | GTC | ACC | gtc | tct | agt 
I BstEII | 

From BstEII onwards, pV323 is same as pCESl, except as noted. 
15 ! BstEII sites may occur in light chains; not likely to be unique in final 
vector. 



20 



2769 



143 144 145 146 147 148 149 150 151 152 

ASTKGPSVFP 
gec tec acc aaG GGC CCa teg GTC TTC ccc 
Bspl20I. Bbsl. . . (2/2) 

Apal .... 



153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 
25 I LAPSSKSTSGGTAAL 
2799 ctg gca ccC TCC TCc aag age acc tct ggg ggc aca gcg gee ctg 
! BseRI. . . (2/2) 



30 



2844 



168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 

GCLVKDYFPEPVTVS 
ggc tgc ctg GTC AAG GAC TAC TTC CCc gaA CCG GTg acg gtg teg 

Agel .... 



35 



2889 



183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 

WNS G A Ii T S GVHTFPA 
tgg aac tea GGC GCC ctg acc age ggc gtc cac acc ttc ccg get 
Kasl. . . (1/4) 



198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 
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10 



15 



20 



25 



30 



35 





V 


L 


Q 


S 


s 


G 


L 


Y 


S L 


S 


S 


V 


V 


T 




gtc 


_ _ 
eta 


cag 


tct 


age 


GGa 


etc 


tac 


tec etc 


age 


age 


gta 


gtg 


acc 










(Bsu36I. 


. . ) (knocked 


out) 














213 


214 


215 


216 






219 


220 


221 222 


223 


224 


225 


226 


227 




V 


P 


S 


S 




T 


G 


T 


Q T 


Y 


I 


C 


N 


V 


2979 


gtg 


ccC 


tct 


tct 


age 


tTG 


Ggc 


acc 


cag acc 


tac 


ate 


tgc 


aac 


gtg 








{BstXI. . 


• ■ • • 


• • • 


. ,)N.B. destruction of BstXI 


& Bpml sites. 




228 


229 


230 


231 


232 


233 


234 


235 


236 237 


238 


239 


240 


241 


242 




N 


H 


K 


P 


S 


M 


T 


K 


V D 


K 


K 


V 


E 


P 


3024 


aat 


cac 


aag 


CCC 


age 


aac 


ace 


aag 


gtg gac 


aag 


aaa 


gtt 


gag 


CCC 




243 


244 


245 


























K 


S 


C 


A 


A 


A 


H 


H 


H H 


H 


H 


S 


A 




3069 


aaa 


tct 


tgt 


GCG 


GCC 


GCt 


cat 


cac 


cac cat 


cat 


cac 


tct 


get 


































£ 


Q 


K 


L 


I 


S 


E 


E 


D L 


N 


G 


A 


A 




3111 


gaa 


caa 


aaa 


etc 


ate 


tea 


gaa 


gag 


gat ctg 


aat 


ggt 


gee 


gca 






D 


I 


N 


D 


D 


R 


M 


A S G A 








3153 


GAT 


ATC 


aac 


gat 


gat 


cgt 


atg 


get AGC ggc gee 









rEK cleavage site. 
EeoRV. - 



Nhel. 



Kasl. 



Domain 1 



3183 



AETVESCLA 
get gaa act gtt gaa agt tgt tta gca 



KPHTEISF 
3210 aaa cec cat aca gaa aat tea ttt 



i 

» TNVWKDDKT 
3234 aCT AAC GTC TGG AAA GAC GAC AAA ACt 



N 



W N 
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3261 tta gat cgt tac get aac tat gag ggt tgt ctg tgG AAT GCt aca ggc gtt 

! BsmI 

t 

! VVCTGDETQCYGTWVPI 
J 3312 gta gtt tgt act ggt GAC GAA ACT CAG TGT TAC GGT ACA TGG GTT cct att 

t 

! G L A I P E N 

3363 ggg ctt get ate cct gaa aat 

! 

10 ! LI linker 

! EGGGSEGGGS 
3384 gag ggt ggt ggc tct gag ggt ggc ggt tct 

i 

! EGGGS EGGGT 

15 3414 gag ggt ggc ggt tct gag ggt ggc ggt act 

i 

! Domain 2 1 

3444 aaa cct cct gag tac ggt gat aca cct att ccg ggc tat act tat ate aac 
3495 cct etc gac ggc act tat ccg cct ggt act gag caa aac ccc get aat cct 
20 3546 aat cct tct ctt GAG GAG tct cag cct ctt aat act ttc atg ttt cag aat 

! BseRI 

3597 aat agg ttc cga aat agg cag ggg gca tta act gtt tat acg ggc act 
3645 gtt act caa ggc act gac ccc gtt aaa act tat tac cag tac act cct 
3693 gta tea tea aaa gee atg tat gac get tac tgg aac ggt aaa ttc AGA 
25 ! AlwNI 
3741 GAC TGc get ttc cat tct ggc ttt aat gaa gat cca ttc gtt tgt gaa 

! AlwNI 

3789 tat caa ggc caa teg tct gac ctg cct caa cct cct gtc aat get 

j 

30 3834 ggc ggc ggc tct 

! start L2 

3846 ggt ggt ggt tct 
3858 ggt ggc ggc tct 

3870 gag ggt ggt ggc tct gag ggt ggc ggt tct 
35 3900 gag ggt ggc ggc tct gag gga ggc ggt tec 

3930 ggt ggt ggc tct ggt ! end L2 



Domain 3 

S GD FDYEKMANAN KGA 
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3945 tec ggt gat ttt gat tat gaa aag atg gca aac get aat aag ggg get 



MTENADENALQSDAKG 
3993 atg ace gaa aat gee gat gaa aac gcg eta cag tct gac get aaa ggc 

KLDSVATDYGAAIDGF 
4041 aaa ctt gat tct gtc get act gat tac ggt get get ate gat ggt _ttc 



10 



15 



20 



25 



! 



IGDVS GLANGNGATGD 
4089 att ggt gac gtt tec ggc ctt get aat ggt aat ggt get act ggt gat 

FAGSN SQMAQVGDGDN 
4137 ttt get ggc tct aat tec caa atg get caa gtc ggt gac ggt gat aat 

SPLMNNFRQYLPSLPQ 
4185 tea cct tta atg aat aat ttc cgt caa tat tta cct tec etc cct caa 

SVECRPFVFSAGKPYE 
4233 teg gtt gaa tgt cgc cct ttt gtc ttt age get ggt aaa cca tat gaa 

fsidcdkj;nlfr 

4281 ttt tct att gat tgt gac aaa at a aac tta ttc cgt 

End Domain 3 

GVFAFLLYVATFMYV F140 
4317 ggt gtc ttt gcg ttt ctt tta tat gtt gee ace ttt atg tat gta ttt 
start transmembrane segment 



30 



! ST F A N I L 

4365 tct acg ttt get aac ata ctg 



35 



! R N K E S 

4386 cgt aat aag gag tct TAA ! stop of iii 
Intracellular anchor. 

Ml P2 V L L5 G I P L L10 L R F L G15 
4404 tc ATG cca gtt ctt ttg ggt att ccg tta tta ttg cgt ttc etc ggt 
Start VI 
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4451 ttc ctt ctg gta act ttg ttc ggc tat ctg ctt act ttt ctt aaa aag 

4499 ggc ttc ggt aag ata get att get att tea ttg ttt ctt get ctt att 

4547 att ggg ctt aac tea att ctt gtg ggt tat etc tct gat att age get 

4595 caa tta ccc tct gac ttt gtt cag ggt gtt cag tta att etc ccg tct 

5 4643 aat gcg ctt ccc tgt ttt tat gtt att etc tct gta aag get get att 

4691 ttc att ttt gac gtt aaa caa aaa ate gtt tct tat ttg gat tgg gat 

i 

! Ml A2 V3 F5 L10 G13 

4739 aaa TAA t ATG get gtt tat ttt gta act ggc aaa tta ggc tct gga 

10 ! end VI Start gene I 
! 





1 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 




! 


K 


T 


L 


V 


S 


V 


G 


K 


I 


Q 


D 


K 


I 


V 


A 


15 


4785 


aaa 


a eg 


etc 


gtt 


age 


gtt 


oat 


aaa 


att 


caa 


aat 


aaa 


att 


ata 


act 






29 


30 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


41 


42 


43 






G 


c 


K 


I 


A 


T 


N 


L 


D 


L 


R 


L 


o 


N 


L 




4830 




tgc 


aaa 


ata 


oca 


act 


aat 


ctt 


aat 


tta 


aaa 


ctt 


caa 


aac 


etc 


20 




44 


45 


46 


47 


48 


49 


50 


51 


52 


53 


54 


55 


56 


57 


58 






P 


Q 


V 


G 


R 


F 


A 


K 


T 


P 


R 


V 


L 


R 


I 




4875 

! 


ccg 


caa 


gtc 


ggg 


agg 


ttc 


get 


aaa 


a eg 


cct 


cgc 


gtt 


ctt 


aga 


ata 




1 


59 


60 


61 


62 


63 


64 


65 


66 


67 


68 


69 


70 


71 


72 


73 


25 


! 


P 


D 


K 


p 


S 


I 


S 


D 


L 


L 


A 


I 


G 


R 


G 




4920 


ccg 


gat 


aag 


cct 


tct 


ata 


tct 


gat 


ttg 


ctt 


get 


att 


ggg 


cgc 


ggt 






74 


75 


76 


77 


78 


79 


80 


81 


82 


83 


84 


85 


86 


87 


88 






N 


D 


S 


Y 


D 


E 


N 


K 


N 


G 


L 


L 


V 


L 


D 


30 


4965 

i 


aat 


gat 


tec 


tac 


gat 


gaa 


aat 


aaa 


aac 


ggc 


ttg 


ctt 


gtt 


etc 


gat 




i 


89 


90 


91 


92 


93 


94 


95 


96 


97 


98 


99 


100 


101 


102 


103 




! 


E 


C 


G 


T 


W 


F 


N 


T 


R 


S 


W 


N 


D 


K 


E 


35 


5010 

i 


gag 


tgc 


ggt 


act 


tgg 


ttt 


aat 


acc 


cgt 


tct 


tgg 


aat 


gat 


aag 


gaa 




! 


104 


105 


106 


107 


108 


109 


110 


111 


112 


113 


114 


115 


116 


117 


118 




i 


R 


Q 


P 


I 


I 


D 


W 


F 


L 


H 


A 


R 


K 


L 


G 




5055 


aga 


cag 


ccg 


att 


att 


gat 


tgg 


ttt 


eta 


cat 


get 


cgt 


aaa 


tta 


gga 



I 
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J 




119 


120 


121 




! 




W 


D 


I 




! 


5100 


tgg 


gat 


att 


5 






134 


135 


136 




, 




Q 


A 


R 




i 


5145 


cag 


gcg 


cgt 




! 




149 


150 


151 


JO 


! 




L 


D 


R 






5190 


ctg 


gac 


aga 








164 


165 


166 








I 


T 


G 


15 


1 


5235 


att 


act 


ggc 








179 


180 


181 








V 


K 


Y 






5280 


gtt 


aaa 


tat 


20 


! 












i 




194 


195 


196 




i 




L 


Y 


T 




i 


5325 


ctt 


tat 


act 


25 


i 




209 


210 


211 




! 




A 


F 


S 




! 


5370 


get 


ttt 


tct 




i 




224 


225 


226 


30 


i 




P 


Y 


L 






5415 


cct 


tat 


tta 



I 





j 


239 


240 


241 




! 


Q 


K 


M 


35 


5460 


cag 


aag 


atg 






254 


255 


256 






V 


L 


C 




5505 


gtt 


ctt 


tgt 



122 


123 


124 


125 


126 


127 


I 


F 


L 


V 


Q 


D 


att 


ttt 


ctt 


gtt 


cag 


gac 


137 


138 


139 


140 


141 


142 


S 


A 


L 


A 


E 


H 


tct 


gca 


tta 


get 


gaa 


cat 


152 


153 


154 


155 


156 


157 


I 


T 


L 


P 


F 


V 


att 


act 


tta 


cct 


ttt 


gtc 


167 


168 


169 


170 


171 


172 


S 


K 


M 


P 


L 


P 


teg 


aaa 


atg 


cct 


ctg 


cct 


182 


183 


184 


185 


186 


187 


G 


D 


S 


Q 


L 


S 


ggc 


gat 


tct 


caa 


tta 


age 


197 


198 


199 


200 


201 


202 


G 


K 


N 


L 


Y 


N 


ggt 


aag 


aat 


ttg 


tat 


aac 


212 


213 


214 


215 


216 


217 


S 


N 


Y 


D 


S 


G 


agt 


aat 


tat 


gat 


tec 


ggt 


227 


228 


229 


230 


231 


232 


S 


H 


G 


R 


Y 


F 


tea 


cac 


ggt 


egg 


tat 


ttc 


242 


243 


244 


245 


246 


247 


K 


L 


T 


K 


I 


Y 


aaa 


tta 


act 


aaa 


ata 


tat 


257 


258 


259 


260 


261 


262 


L 


A 


I 


G 


F 


A 


ctt 


gcg 


att 


gga 


ttt 


gca 



128 


129 


130 


131 


132 


133 


L 


S 


I 


V 


D 


K 


tta 


tct 


att 


gtt 


gat 


aaa 


143 


144 


145 


146 


147 


148 


V 


V 


Y 


C 


R 


R 


gtt 


gtt 


tat 


tgt 

- 


cgt 


cgt 


158 


159 


160 


161 


162 


163 


G 


T 


L 


Y 


S 


L 


ggt 


act 


tta 


tat 


tct 


ctt 


173 


174 


175 


176 


177 


178 


K 


L 


H 


V 


G 


V 


aaa 


tta 


cat 


gtt 


ggc 


gtt 


188 


189 


190 


191 


192 


193 


P 


T 


V 


E 


R 


W 


cct 


act 


gtt 


gag 


cgt 


tgg 


203 


204 


205 


206 


207 


208 


A 


Y 


D 


T 


K 


Q 


gca 


tat 


gat 


act 


aaa 


cag 


218 


219 


220 


221 


222 


223 


V 


Y 


S 


Y 


L 


T 


gtt 


tat 


tct 


tat 


tta 


acg 


233 


234 


235 


236 


237 


238 


K 


P 


L 


N 


L 


G 


aaa 


cca 


tta 


aat 


tta 


ggt 


248 


249 


250 


251 


252 


253 


L 


K 


K 


F 


S 


R 


ttg 


aaa 


aag 


ttt 


tct 


cgc 


263 


264 


265 


266 


267 


268 


S 


A 


F 


T 


Y 


S 


tea 


gca 


ttt 


aca 


tat 


agt 
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269 


270 


271 


272 


273 


274 


275 


276 


277 


278 


279 


280 


281 


282 


283 




! 




Y 


I 


T 


Q 


P 


K 


P 


E 


V 


K 


K 


V 


V 


S 


Q 


5 


! 


5550 


tat 


ata 


acc 


caa 


cct 


aag 


ccg 


gag 


gtt 


aaa 


aag 


gta 


gtc 


tct 


cag 








284 


o o c 

285 


o a ^ 

286 


287 


288 


o o es 

289 


290 


291 


292 


293 


294 


295 


296 


297 


298 




\ 




T 


Y 


D 


F 


D 


K 


F 


T 


I 


D 


S 


S 


Q 


R 


L 




! 


5595 


acc 


tat 


gat 


ttt 


gat 


aaa 


ttc 


act 


att 


gac 


tct 


tct 


cag" 


cgt 


ctt 


10 


i 




299 


300 


301 


302 


303 


304 


305 


306 


307 


308 


309 


310 


311 


312 


313 




j 




N 


L 


S 


Y 


R 


Y 


V 


F 


K 


D 


S 


K 


G 


K 


L 






5640 


aat 


eta 


age 


tat 


cgc 


tat 


gtt 


ttc 


aag 


gat 


tct 


aag 


gga 


aaa 


TTA 




































PacI 


15 






314 


315 


316 


317 


318 


319 


320 


321 


322 


323 


324 


325 


326 


327 


328 








I 


N 


S 


D 


D 


L 


Q 


K 


Q 


G 


Y 


S 


L 


T 


Y 






5685 


ATT 


AAt 


age 


gac 


gat 


tta 


cag 


aag 


caa 


ggt 


tat 


tea 


etc 


aca 


tat 






Pad 






























20 






329 


330 


331 


332 


333 


334 


335 


336 


337 


338 


339 


340 


341 


342 


343 



25 



30 



35 



s 



N 



N 



i I D L C T "\ 
iv Ml K 

5730 att gat tta tgt act gtt tec att aaa aaa ggt aat tea aAT Gaa 

Start IV 

344 345 346 347 348 349 
i I V K C N .End of I 
iv L3 L N5 V 17 N F . V10 
5775 att gtt aaa tgt aat TAA T TTT GTT 
IV continued 

5800 ttc ttg atg ttt gtt tea tea tct tct ttt get cag gta att gaa atg 
5848 aat aat teg cct ctg cgc gat ttt gta act tgg tat tea aag caa tea 
5896 ggc gaa tec gtt att gtt tct ccc gat gta aaa ggt act gtt act gta 
5944 tat tea tct gac gtt aaa cct gaa aat eta cgc aat ttc ttt att tct 
5992 gtt tta cgt get aat aat ttt gat atg gtt ggt tea att cct tec ata 
6040 att cag aag tat aat cca aac aat cag gat tat att gat gaa ttg cca 
6088 tea tct gat aat cag gaa tat gat gat aat tec get cct tct ggt ggt 
6136 ttc ttt gtt ccg caa aat gat aat gtt act caa act ttt aaa att aat 
6184 aac gtt egg gca aag gat tta ata cga gtt gtc gaa ttg ttt gta aag 
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6232 


tct 


aat 


act 


tct 


aaa 


tec 


tea 


aat 


gta 


tta 


tct 


att 


gac 


ggc 


tct 


aat 


6280 


eta 


tta 


gtt 


gtt 


TCT 


gca 


cct 


aaa 


gat 


att 


tta 


gat 


aac 


ctt 


cct 


caa 












ApaLI removed 


















6328 


ttc 


ctt 


tct 


act 


gtt 


gat ttg 


cca 


act 


gac 


cag 


ata 


ttg 


att 


gag 


ggt 


6376 


ttg 


ata 


ttt 


gag 


gtt 


cag 


caa 


ggt 


gat 


get 


tta 


gat 


ttt 


tea 


ttt 


get 


6424 


get 


ggc 


tct 


cag 


cgt 


ggc 


act 


gtt 


gca 


ggc 


ggt 


gtt 


aat 


act 


gac 


cgc 


6472 


etc 


acc 


tct 


gtt 


tta 


tct 


tct 


get 


ggt 


ggt 


teg 


ttc 


ggt 


att 


ttt 


_aat 


6520 


ggc 


gat 


gtt 


tta 


ggg 


eta 


tea 


gtt 


cgc 


gca 


tta 


aag 


act 


aat 


age 


cat 


6568 


tea 


aaa 


ata 


ttg 


tct 


gtg 


cca 


cgt 


att 


ctt 


acg 


ctt 


tea 


ggt 


cag 


aag 


6616 


ggt 


tct 


ate 


tct 


gtT 


GGC 


CAg 


aat 


gtc 


cct 


ttt 


att 


act 


ggt 


cgt 


gtg 












MscI 






















6664 


act 


ggt 


gaa 


tct 


gee 


aat 


gta 


aat 


aat 


cca 


ttt 


cag 


acg 


att 


gag 


cgt 


6712 


caa 


aat 


gta 


ggt 


att 


tec 


atg 


age 


gtt 


ttt 


cct 


gtt 


gca 


atg 


get 


ggc 


6760 


ggt 


aat 


att 


gtt 


ctg 


gat 


att 


acc 


age 


aag 


gee 


gat 


agt 


ttg 


agt 


tct 


6608 


tct 


act 


cag 


gca 


agt gat gtt 


att 


act 


aat 


caa 


aga 


agt 


att 


get 


aca 


6856 


acg 


gtt 


aat 


ttg 


cgt 


gat 


gga 


cag 


act 


ctt 


tta 


etc 


ggt 


ggc 


etc 


act 


6904 


gat 


tat 


aaa 


aac 


act 


tct 


caa 


gat 


tct 


ggc 


gta 


ccg 


ttc 


ctg 


tct 


aaa 


6952 


ate 


cct 


tta 


ate 


ggc 


etc 


ctg 


ttt 


age 


tec 


cgc 


tct 


gat 


tec 


aac 


gag 


7000 


gaa 


age 


acg 


tta 


tac 


gtg 


etc 


gtc 


aaa 


gca 


acc 


ata 


gta 


cgc 


gec 


ctg 



7048 TAG eggegcatt 
End IV 

7060 aagcgcggcg ggtgtggtgg ttacgcgcag cgtgaccgct acacttgcca gcgccctagc 
7120 gcccgctcct ttegctttet tcccttcctt tctcgccacg ttcGCCGGCt ttccccgtca 

NgoMI_ 

7180 agctctaaat cgggggctcc ctttagggtt ccgatttagt getttaegge acctcgaccc 
7240 caaaaaactt gatttgggtg atggttCACG TAGTGggcca tcgccctgat agacggtttt 

Drain 

7300 tcgccctttG ACGTTGGAGT Ccacgttctt taatagtgga ctcttgttcc aaactggaac 
DrdI 

7360 aacactcaac cctatctcgg gctattcttt tgatttataa gggattttgc egatttegga 
7420 accaccatca aacaggattt tcgcctgctg gggcaaacca gcgtggaccg ettgetgeaa 
7480 ctctctcagg gecaggeggt gaagggcaat CAGCTGttgc cCGTCTCact ggtgaaaaga 

PvuII . BsmBI . 

7540 aaaaccaccc tGGATCC AAGCTT 

BaroHI Hindi I I (1/2) 

Insert carrying bla gene 
7563 gcaggtg gcacttttcg gggaaatgtg cgcggaaccc 

7600 ctatttgttt atttttctaa atacattcaa atatGTATCC gctcatgaga caataaccct 

BciVI 
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8790 CCTGAGG 
Bsu36I_ 

8797 ccgat actgtcgtcg tcccctcaaa ctggcagatg 

8832 cacggttacg atgcgcccat ctacaccaac gtaacctatc ccattacggt caatccgccg 
8892 tttgttccca cggagaatcc gacgggttgt tactcgctca catttaatgt tgatgaaagc 
8952 tggctacagg aaggccagac gcgaattatt tttgatggcg ttcctattgg ttaaaaaatg 
9012 agctgattta acaaaaattt aacgcgaatt ttaacaaaat attaacgttt a caATTTAAA 

Swal . . . 

9072 Tatttgctta tacaatcttc ctgtttttgg ggcttttctg attatcaacc GGGGTAcat 

RBS? 

9131 ATG att gac atg eta gtt tta cga tta ccg ttc ate gat tct ctt gtt tgc 
Start gene II 

9182 tec aga etc tea ggc aat gac ctg ata gee ttt gtA GAT CTc tea aaa ata 

Bglll. . . 

9233 get ace etc tec ggc atg aat tta tea get aga acg gtt gaa tat cat att 
9284 gat ggt gat ttg act gtc tec ggc ctt tct cac cct ttt gaa tct tta cct 
9335 aca cat tac tea ggc att gca ttt aaa ata tat gag ggt tct aaa aat ttt 
9386 tat cct tgc gtt gaa ata aag get tct ccc gca aaa gta tta cag ggt cat 
9437 aat gtt ttt ggt aca ace gat tta get tta tgc tct gag get tta ttg ctt 
9488 aat ttt get aat tct ttg cct tgc ctg tat gat tta ttg gat gtt ! 9532 
gene II continues 
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Table 120B: Sequence of MALIA3, condensed 

LOCUS MALIA3 9532 CIRCULAR 

ORIGIN 



1 


AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG 


CTCGCGCCCC 


AAATGAAAAT 


61 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAAT CTACT 


121 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


181 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


241 


TCCGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


301 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTT GAAG 


361 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


421 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


481 


TTTGAGGGGG 


ATT CAAT GAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


541 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


601 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


661 


AATT CCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATT CCTAA ATCTCAACTG 


721 


AT GAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


781 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


841 


CAAT GATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


901 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


961 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


AT GAAGGT CA 


GCCAGCCTAT 


GCGCCTGGTC 


1021 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATT GACC 


1081 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1141 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1201 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1261 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1321 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1381 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATAT CGGTTA 


1441 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTAT CAAGC 


TGTTTAAGAA 


1501 


ATTCACCTCG 


AAAGCAAGCT 


GATAAACCGA 


T Af* A ATT A A A 


GGCTCCTTTT 


GGAGCCTTTT 


1561 


TTTTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTCCTTTC 


1621 


TATTCTCACA 


GTGCACAGTC 


TGTCGTGACG 


CAGCCGCCCT 


CAGTGTCTGG 


GGCCCCAGGG 


1681 


CAGAGGGTCA 


CCATCTCCTG 


CACT GGGAGC 


AGCTCCAACA 


TCGGGGCAGG 


TTATGAT GTA 


1741 


CACTGGTACC 


AGCAGCTTCC 


AGGAACAGCC 


CCCAAACTCC 


TCATCTATGG 


TAACAGCAAT 


1801 


CGGCCCTCAG 


GGGTCCCTGA 


CCGATTCTCT 


GGCTCCAAGT 


CTGGCACCTC 


AGCCTCCCTG 


1861 


GCCATCACTG 


GGCT C CAGGC 


TGAGGATGAG 


GCTGATTATT 


ACTGCCAGTC 


CTATGACAGC 


1921 


AGCCTGAGTG 


GCCTTTATGT 


CTTCGGAACT 


GGGACCAAGG 


TCACCGTCCT 


AGGTCAGCCC 


1981 


AAGGCCAACC 


CCACTGTCAC 


TCTGTTCCCG 


CCCTCCTCTG 


AGGAGCTCCA AGCCAACAAG 


2041 


GCCACACTAG 


TGTGTCTGAT 


CAGTGACTTC 


TACCCGGGAG 


CTGTGACAGT 


GGCCTGGAAG 


2101 


GCAGATAGCA 


GCCCCGTCAA 


GGCGGGAGTG 


GAGACCACCA 


CACCCTCCAA ACAAAGCAAC 
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2161 


AACAAGTACG 


CGGCCAGCAG 


CTATCT GAGC 


2221 


AGCTACAGCT 


GCCAGGTCAC 


GCATGAAGGG 


2281 


GAATGTTCAT 


AATAAACCGC 


CTCCACCGGG 


2341 


TAATGAAATA 


CCTATTGCCT 


ACGGCAGCCG 


2401 


CCATGGCCGA 


AGTTCAATTG 


TTAGAGTCTG 


2461 


TACGTCTTTC 


TTGCGCTGCT 


TCCGGATTCA 


2521 


GCCAAGCTCC 


TGGTAAAGGT 


TTGGAGTGGG 


2581 


CTTACTAT GC 


TGACTCCGTT 


AAAGGTCGCT 


2641 


CTCTCTACTT 


GCAGATGAAC 


AGCTTAAGGG 


2701 


AAGACTATGA 


AGGTACTGGT 


TATGCTTTCG 


2761 


TCTCTAGTGC 


CTCCACCAAG 


GGCCCATCGG 


2821 


CCTCTGGGGG 


CACAGCGGCC 


CTGGGCTGCC 


2881 


CGGTGTCGTG 


GAACTCAGGC 


GCCCTGACCA 


2941 


AGTCTAGCGG 


ACTCTACTCC 


CTCAGCAGCG 


3001 


CCCAGACCTA 


CATCTGCAAC 


GTGAATCACA 


3061 


TTGAGCCCAA 


ATCTTGTGCG 


GCCGCTCATC 


3121 


TCATCTCAGA 


AGAGGATCTG 


AATGGTGCCG 


3181 


CCGCTGAAAC 


TGTTGAAAGT 


TGTTTAGCAA 


3241 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


3301 


CTACAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


3361 


TTGGGCTTGC 


TATCCCTGAA 


AATGAGGGTG 


3421 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


3481 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


3541 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


3601 


GGTTCCGAAA 


TAGGCAGGGG 


GCATTAACTG 


3661 


ACCCCGTTAA 


AACTTATTAC 


CAGTACACTC 


3721 


ACTGGAACGG 


TAAATTCAGA 


GACTGCGCTT 


3781 


TTTGTGAATA 


TCAAGGCCAA 


TCGTCTGACC 


3841 


GCTCTGGTGG 


TGGTTCTGGT 


GGCGGCTCTG 


3901 


AGGGTGGCGG 


CTCTGAGGGA 


GGCGGTTCCG 


3961 


ATGAAAAGAT 


GGCAAACGCT 


AATAAGGGGG 


4021 


TACAGTCTGA 


CGCTAAAGGC 


AAACTTGATT 


4081 


ATGGTTTCAT 


TGGTGACGTT 


TCCGGCCTTG 


4141 


CTGGCTCTAA 


TTCCCAAATG 


GCTGAAGTCG 


4201 


ATTTCCGTCA 


ATATTTACCT 


TCCCTCCCTC 


4261 


GCGCTGGTAA 


ACCATATGAA 


TTTTCTATTG 


4321 


TCTTTGCGTT 


TCTTTTATAT 


GTTGCCACCT 


4381 


TACTGCGTAA 


TAAGGAGTCT 


TAATCATGCC 


4441 


TTTCCTCGGT 


TTCCTTCTGG 


TAACTTTGTT 



CTGACGCCTG AGCAGT GGAA GTCCCACAGA 
AGCACCGTGG AGAAGACAGT GGCCCCTACA 
CGCGCCAATT CTATTT CAAG GAGACAGTCA 
CTGGATTGTT ATTACTCGCG GCCCAGCCGG 
GTGGCGGTCT TGTTCAGCCT GGTGGTTCTT 
CTTTCTCTTC GTACGCTATG TCTTGGGTTC 
TTTCTGCTAT CTCTGGTTCT GGTGGCAGTA 
TCACTATCTC TAGAGACAAC TCTAAGAATA 
CTGAGGACAC TGCAGTCTAC TATTGCGCTA 
ACATATGGGG TCAAGGTACT ATGGTCACCG 
TCTTCCCCCT GGCACCCTCC TCCAAGAGCA 
T GGTCAAGGA CTACTTCCCC GAACCGGTGA 
GCGGCGTCCA CACCTTCCCG GCTGTCCTAC 
TAGTGACCGT GCCCTCTTCT AGCTTGGGCA 
AGCCCAGCAA CACCAAGGTG GACAAGAAAG 
ACCACCATCA . TCACTCTGCT GAACAAAAAC 
CAGATATCAA CGATGATCGT ATGGCTGGCG 
AACCCCATAC AGAAAATTCA TTTACTAACG 
ACGCTAACTA TGAGGGTTGT CTGTGGAATG 
AAACT CAGTG TTACGGTACA TGGGTTCCTA 
GTGGCTCTGA GGGTGGCGGT TCTGAGGGTG 
CTGAGTACGG TGATACACCT ATTCCGGGCT 
ATCCGCCTGG TACTGAGCAA AACCCCGCTA 
CTCTTAATAC TTTCATGTTT CAGAATAATA 
TTTATACGGG CACTGTTACT CAAGGCACTG 
CTGTATCATC AAAAGCCATG TATGACGCTT 
TCCATTCTGG CTTTAATGAA GATCCATTCG 
TGCCTCAACC TCCTGTCAAT GCTGGCGGCG 
AGGGTGGTGG CTCTGAGGGT GGCGGTTCTG 
GTGGTGGCTC TGGTTCCGGT GATTTTGATT 
CTATGACCGA AAATGCCGAT GAAAACGCGC 
CTGTCGCTAC TGATTACGGT GCTGCTATCG 
CTAATGGTAA TGGTGCTACT GGTGATTTTG 
GTGACGGTGA TAATT CACCT TTAATGAATA 
AATCGGTTGA ATGTCGCCCT TTTGTCTTTA 
ATTGT GACAA AATAAACTTA TTCCGTGGTG 
TTATGTATGT ATTTTCTACG TTTGCTAACA 
AGTTCTTTTG GGTATTCCGT TATTATT GCG 
CGGCTATCTG CTTACTTTTC TTAAAAAGGG 
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4501 CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG GGCTTAACTC 
4561 AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT TTGTTCAGGG 
4621 TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TAT GTTATTC TCTCTGTAAA 
4681 GGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTTATTTGG ATTGGGATAA 
5 4741 ATAATATGGC TGTTTATTTT GTAACTGGCA AATTAGGCTC TGGAAAGACG CTCGTTAGCG 

4801 TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT CTTGATTTAA 
4861 GGCTTCAAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT CTTAGAATAC 
4921 CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT TCCTACGATG 
4981 AAAATAAAAA CGGCTTGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT ACCCGTTCTT 

10 5041 GGAATGATAA GGAAAGAGAG CCGATTATTG ATT GGTTTCT ACATGCTCGT AAATTAGGAT 

5101 GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAAACAGGCG CGTTCTGCAT 
5161 TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGAGAGAAT TACTTTACCT TTTGTCGGTA 
5221 CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT GTTGGCGTTG 
5281 TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT ACTGGTAAGA 

15 5341 ATTTGTATAA CGCATAT GAT ACTAAACAGG CTTTTTCTAG TAATTAT GAT TCCGGTGTTT 

5401 ATTCTTATTT AACGC CTTAT TTATCACACG GTCGGTATTT CAAACCATTA AATTTAGGTC 
5461 AGAAGATGAA ATTAACTAAA ATATATTTGA AAAAGTTTTC TCGCGTTCTT TGTCTTGCGA 
5521 TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG GAGGTTAAAA 
5581 AGGTAGT CTC T GAGAC CT AT GATTTTGATA AATTCACTAT TGACTCTTCT CAGCGTCTTA 

20 5641 ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT AGCGACGATT 

5701 TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC ATTAAAAAAG 
5761 GTAATTCAAA TGAAATT GTT AAAT GTAATT AATTTTGTTT TCTTGATGTT TGTTTCATCA 
5821 TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT TGTAACTTGG 
5881 TATT CAAAGC AATCAGGCGA AT C CGTTAT T GTTTCTCCCG ATGTAAAAGG TACTGTTACT 

25 5941 GTATATT CAT CT GACGTTAA AC CTGAAAAT CTACGCAATT TCTTTATTTC TGTTTTACGT 

6001 GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA TAATCCAAAC 
6061 AATCAGGATT ATATTGATGA ATT GC CAT CA TCTGATAATC AGGAATATGA TGATAATTCC 
6121 GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACT CAAAC TTTTAAAATT 
6181 AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA GTCTAATACT 

30 6241 TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT TTCTGCACCT 

6301 AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC AACTGACCAG 
6361 ATATT GATTG AGGGTTTGAT ATTT GAGGTT CAGCAAGGTG ATGCTTTAGA TTTTTCATTT 
6421 GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG CCTCACCTCT 
6481 GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT AGGGCTATCA 

35 6541 GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG TATTCTTACG 

6601 CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT TACTGGTCGT 
6661 GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG T CAAAATGTA 
6721 GGTATTTCCA TGAGCGTTTT TCCTGTTGCA AT GGCTGGCG GTAATATTGT TCTGGATATT 
6781 ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GT GAT GTT AT TACTAAT CAA 
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6841 


AGAAGTATTG 


CTACAACGGT 


TAATTTGCGT 


GATGGACAGA 


CTCTTTTACT 


CGGTGGCCTC 


6901 


ACTGATTATA AAAACACTTC 


TCAAGATTCT 


GGCGTACCGT 


TCCTGTCTAA AATCCCTTTA 


6961 


ATCGGCCTCC 


TGTTTAGCTC 


CCGCTCTGAT 


TCCAACGAGG 


AAAGCACGTT 


ATACGT GCTC 


7021 


GTCAAAGCAA 


CCATAGTACG 


CGCCCTGTAG 


CGGCGCATTA 


AGCGCGGCGG 


GTGTGGTGGT 


7081 


TACGCGCAGC 


GTGACCGCTA 


CACTTGCCAG 


CGCCCTAGCG 


CCCGCTCCTT 


TCGCTTTCTT 


7141 


CCCTTCCTTT 


CTCGCCACGT 


TCGCCGGCTT 


TCCCCGTCAA 


GCT CTAAAT C 


GGGGGCTCCC 


7201 


TTTAGGGTTC 


CGATTTAGTG 


CTTTACGGCA 


CCTCGACCCC 


AAAAAACTTG ATTTGGGTGA 


7261 


TGGTTCACGT 


AGT GGGCCAT 


CGCCCTGATA 


GACGGTTTTT 


CGCCCTTTGA CGTTGGAGTC 


7321 


CACGTTCTTT 


AATAGT GGAC 


TCTTGTTCCA AACTGGAACA 


ACACTCAACC 


CTATCTCGGG 


7381 


CTATTCTTTT 


GATTTATAAG 


GGATTTTGCC 


GATTTCGGAA 


CCACCATCAA ACAGGATTTT 


7441 


CGCCTGCTGG 


GGCAAACCAG 


CGTGGACCGC 


TTGCTGCAAC 


TCTCTCAGGG 


CCAGGCGGTG 


7501 


AAGGGCAATC 


AGCTGTTGCC 


CGTCTCACTG 


GTGAAAAGAA 


AAACCACCCT 


GGATCCAAGC 


7561 


TTGCAGGTGG 


CACTTTTCGG 


GGAAATGTGC 


GCGGAACCCC 


TATTTGTTTA 


TTTTTCTAAA 


7621 


TACATTCAAA 


TATGTATCCG 


CTCATGAGAC 


AATAACCCTG 


ATAAATGCTT 


CAATAATATT 


7681 


GAAAAAGGAA GAGTATGAGT ATTCAACATT 


TCCGTGTCGC 


CCTTATTCCC 


TTTTTTGCGG 


7741 


CATTTTGCCT 


TCCTGTTTTT 


GCTCACCCAG 


AAACGCTGGT 


GAAAGTAAAA 


GATGCTGAAG 


7801 


ATCAGTTGGG 


CGCACGAGTG 


GGTTACATCG 


AACTGGATCT 


CAACAGCGGT 


AAGATCCTTG 


7861 


AGAGTTTTCG 


CCCCGAAGAA 


CGTTTTCCAA 


T GAT GAGCAC 


TTTTAAAGTT 


CTGCTATGTC 


7921 


ATACACTATT 


ATCCCGTATT 


GACGCCGGGC 


AAGAGCAACT 


CGGTCGCCGG 


GCGCGGTATT 


7981 


CT CAGAAT GA 


CTTGGTTGAG 


TACTCACCAG 


TCACAGAAAA 


GCAT CTTACG 


GATGGCATGA 


8041 


CAGTAAGAGA ATTATGCAGT 


GCTGCCATAA 


CCATGAGTGA 


TAACACTGCG 


GCCAACTTAC 


8101 


TTCTGACAAC 


GATCGGAGGA 


CCGAAGGAGC 


TAACCGCTTT 


TTTGCACAAC 


ATGGGGGATC 


8161 


AT GTAACTCG 


CCTTGATCGT 


TGGGAACCGG 


AGCT GAATGA 


AGCCATACCA 


AACGACGAGC 


8221 


GT GACACCAC 


GATGCCTGTA 


GCAATGCCAA 


CAACGTTGCG 


CAAACTATTA ACT GGCGAAC 


8281 


TACTTACTCT 


AGCTTCCCGG 


CAACAATTAA TAGACTGGAT 


GGAGGCGGAT 


AAAGTTGCAG 


8341 


GACCACTTCT 


GCGCTCGGCC 


CTTCCGGCTG 


GCT GGTTTAT 


T GCT GATAAA 


TCTGGAGCCG 


8401 


GTGAGCGTGG 


GTCTCGCGGT 


AT CATTGCAG 


CACTGGGGCC 


AGAT GGTAAG 


CCCTCCCGTA 


8461 


TCGTAGTTAT 


CTACACGACG 


GGGAGTCAGG 


CAACTATGGA 


TGAACGAAAT 


AGACAGATCG 


8521 


CTGAGATAGG 


TGCCTCACTG 


ATTAAGCATT 


GGTAACTGTC 


AGACCAAGTT 


TACTCATATA 


8581 


TACTTTAGAT 


TGATTTAAAA 


CTTCATTTTT 


AATTTAAAAG 


GATCTAGGTG 


AAGATCCTTT 


8641 


TTGATAATCT 


CATGACCAAA 


AT CCCTTAAC 


GTGAGTTTTC 


GTTCCACTGT 


ACGTAAGACC 


8701 


CCCAAGCTTG 


TCGACTGAAT 


GGCGAATGGC 


GCTTTGCCTG 


GTTTCCGGCA 


CCAGAAGCGG 


8761 


TGCCGGAAAG 


CTGGCTGGAG 


TGCGATCTTC 


CTGAGGCCGA 


TACTGTCGTC 


GTCCCCTCAA 


8821 


ACTGGCAGAT 


GCACGGTTAC 


GATGCGCCCA 


TCTACACCAA 


CGTAACCTAT 


CCCATTACGG 


8881 


TCAATCCGCC 


GTTTGTTCCC 


ACGGAGAATC 


CGACGGGTTG 


TTACTCGCTC 


ACATTTAATG 


8941 


TTGATGAAAG 


CTGGCTACAG 


GAAGGCCAGA CGCGAATTAT 


TTTTGATGGC 


GTTCCTATTG 


9001 


GTTAAAAAAT 


GAGCT GATTT 


AACAAAAATT 


TAACGCGAAT 


TTTAACAAAA 


TATTAACGTT 


9061 


TACAATTTAA 


ATATTT GCTT 


ATACAATCTT 


CCTGTTTTTG 


GGGCTTTTCT 


GATTATCAAC 


9121 


CGGGGTACAT 


ATGATT GACA 


TGCTAGTTTT 


ACGATTACCG 


TTCATCGATT 


CTCTTGTTTG 
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9181 CTCCAGACTC TCAGGCAATG ACCTGATAGC 
9241 CTCCGGCATG AATTTATCAG CTAGAACGGT 
9301 CTCCGGCCTT TCTCACCCTT TTGAATCTTT 
9361 AATATATGAG GGTTCTAAAA ATTTTTATCC 
9421 AGTATTACAG GGTCATAATG TTTTTGGTAC 
9481 ATT GCTTAAT TTTGCTAATT CTTTGCCTTG 



CTTTGTAGAT CTCTCAAAAA TAGCTACCCT 
TGAATATCAT ATTGATGGTG ATTTGACTGT 
ACCTACACAT TACTCAGGCA TTGCATTTAA 
TTGCGTTGAA ATAAAGGCTT CTCCCGCAAA 
AACCGATTTA GCTTTATGCT CTGAGGCTTT 
CCTGTATGAT TTATTGGATG TT 
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Table 200: Enzymes that either cut 15 or more human GLGs or have 5+-base recognition in FR3 
Typical entry: 

REname Recognition #sites 

GLGid#:base# GLGid#:base# GLGid#:base#. . . . . 

5 

BstEII Ggtnacc 2 
1: 3 48: 3 
There are 2 hits at base# 3 

10 Maelll gtnac 36 



1: 4 


2: 


4 


3: 


4 


4: 


4 


5: 


4 


6: 


4 


7: 4 


8: 


4 


9: 


4 


10: 


4 


11: 


4 


37: 


4 


37: 58 


38: 


4 


38: 


58 


39: 


4 


39: 


58 


40: 


4 


40: 58 


41: 


4 


41: 


58 


42: 


4 


42: 


58 


43: 


4 


43: 58 


44: 


4 


44: 


58 


45: 


4 


45: 


58 


46: 


4 


46: 58 


47: 


4 


47: 


58 


48: 


4 


49: 


4 


50: 


58 


There aa 


re 24 


hits at 


base# 4 












Dsp45I gtsac 










33 










1: 4 


2: 


4 


3: 


4 


4: 


4 


5: 


4 


6: 


4 


7: 4 


8: 


4 


9: 


4 


10: 


4 


11: 


4 


37: 


4 


37: 58 


38: 


4 


38: 


58 


39: 


58 


40: 


4 


40: 


58 


41: 58 


42: 


58 


43: 


4 


43: 


58 


44: 


4 


44: 


58 


45: 4 


45: 


58 


46: 


4 


46: 


58 


47: 


4 


47: 


58 


48: 4 


49: 


4 


50: 


58 















There are 21 hits at base# 4 



HphI tcacc 45 





1: 


5 


2: 


5 


3: 


5 


4: 


5 


5: 


5 


6: 


5 


30 


7: 


5 


8: 


5 


11: 


5 


12: 


5 


12: 


11 


13: 


5 




14: 


5 


15: 


5 


16: 


5 


17: 


5 


18: 


5 


19: 


5 




20: 


5 


21: 


5 


22: 


5 


23: 


5 


24: 


5 


25: 


5 




26: 


5 


27: 


5 


28: 


5 


29: 


5 


30: 


5 


31: 


5 




32: 


5 


33: 


5 


34: 


5 


35: 


5 


36: 


5 


37: 


5 


35 


38: 


5 


40: 


5 


43: 


5 


44: 


5 


45: 


5 


46: 


5 




47: 


5 


48: 


5 


49: 


5 















There are 44 hits at ba.se # 5 
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Nlalll CATG 26 

1: 9 1: 42 2: 42 3: 9 3: 42 4: 9 

4: 42 5: 9 5: 42 6: 42 6: 78 7: 9 

7: 42 8: 21 8: 42 9: 42 10: 42 11: 42 
5 12: 57 13: 48 13: 57 14: 57 31: 72 38: 9 
48: 78 49: 78 

There are 11 hits at base# 42 

There are 1 hits at base! 48 Could cause raggedness. 

10 BsaJI Ccnngg 37 

1: 14 2: 14 5: 14 6: 14 7: 14 8: 14 
8: 65 9: 14 10: 14 11: 14 12: 14 13: 14 
14: 14 15: 65 17: 14 17: 65 18: 65 19: 65 
20: 65 21: 65 22: 65 26: 65 29: 65 30: 65 
15 33: 65 34: 65 35: 65 37: 65 38: 65 39: 65 
40: 65 42: 65 43: 65 48: 65 49: 65 50: 65 
51: 14 

There are 23 hits at base# 65 

There are 14 hits at base# 14 

20 



25 



Alul 


AGct 












42 










1: 


47 


2: 


47 


3: 


47 


4: 


47 


5: 


47 


6: 


47 


7: 


47 


8: 


47 


9: 


47 


10: 


47 


11: 


47 


16: 


63 


23: 


63 


24: 


63 


25: 


63 


31: 


63 


32: 


63 


36: 


63 


37: 


47 


37: 


52 


38: 


47 


38: 


52 


39: 


47 


39: 


52 


40: 


47 


40: 


52 


41: 


47 


41: 


52 


42: 


47 


42: 


52 


43: 


47 


43: 


52 


44: 


47 


44: 


52 


45: 


47 


45: 


52 


46: 


47 


46: 


52 


47: 


47 


47: 


52 


49: 


15 


50: 


47 



There are 23 hits at baset 47 

30 There are 11 hits at base# 52 Only 5 bases from 47 



BlpI GC triage 21 

1: 48 2: 48 3: 48 5: 48 6: 48 7: 48 
8: 48 9: 48 10: 48 11: 48 37: 48 38: 48 
35 39: 48 40: 48 41: 48 42: 48 43: 48 44: 48 
45: 48 46: 48 47: 48 
There are 21 hits at base# 48 
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Mwol GCNNNNNnngc 
1: 48 2: 28 
25: 36 26: 36 
41: 67 42: 67 
5 Alt 61 

There are 10 h: 
There are 7 h: 



19 

19: 36 22: 36 
35: 36 37: 67 
43: 67 44: 67 

at base# 67 
at base# 36 



23: 36 24: 36 
39: 67 40: 67 
45: 67 46: 67 



Ddel Ctnag 71 



1: 


49 


1: 


58 


2: 


49 


2: 


58 


3: 


49 


3: 


58 


3: 


65 


4: 


49 


4: 


58 


5: 


49 


5: 


58 


5: 


65 


6: 


49 


6: 


58 


6: 


65 


7: 


49 


7: 


58 


7: 


65 


8: 


49 


8: 


58 


9: 


49 


9: 


58 


9: 


65 


10: 


49 


10: 


58 


10: 


65 


11: 


49 


11: 


58 


11: 


65 


15: 


58 


16: 


58 


16: 


65 


17: 


58 


18: 


58 


20: 


58 


21: 


58 


22: 


58 


23: 


58 


23: 


65 


24: 


58 


24: 


65 


25: 


58 


25: 


65 


26: 


58 


27: 


58 


27: 


65 


28: 


58 


30: 


58 


31: 


58 


31: 


65 


32: 


58 


32: 


65 


35: 


58 


36: 


58 


36: 


65 


37: 


49 


38: 


49 


39: 


26 


39: 


49 


40: 


49 


41: 


49 


42: 


26 


42: 


49 


43: 


49 


44: 


49 


45: 


49 


46: 


49 


47: 


49 


48: 


12 


49: 


12 


51: 


65 







There are 29 hits at basef 58 

There are 22 hits at base# 49 Only nine base from 58 
There are 16 hits at base# 65 Only seven bases from 58 

25 

Bglll Agatct 11 

1: 61 2: 61 3: 61 4: 61 5: 61 6: 61 
7: 61 9: 61 10: 61 11: 61 51: 47 
There are 10 hits at base! 61 

30 

BstYI Rgatcy 12 

1: 61 2: 61 3: 61 4: 61 5: 61 6: 61 
7: 61 8: 61 9: 61 10: 61 11: 61 51: 47 
There are 11 hits at base# 61 

35 



WO 01/79481 

Hpyl88I TCNga 

1: 64 2: 64 3: 

7: 64 8: 64 9: 
20: 57 27: 57 35: 

5 There are 11 hits at 

There are 4 hits at 

There are 2 hits at 

MslI CAYNNnnRTG 



15 



1: 


72 


2: 


72 


3: 


7: 


72 


8: 


72 


9: 


17: 


72 


18: 


72 


19: 


25: 


72 


26: 


72 


28: 


32: 


72 


33: 


72 


34: 


38: 


72 


39: 


72 


40: 


44: 


72 


45: 


72 


46: 


50: 


72 


51: 


72 





There are 44 hits at 



20 BsiEI CGRYcg 

1: 74 3: 74 4: 
9: 74 10: 74 11: 
33: 74 34: 74 37: 
41: 74 42: 74 45: 
25 There are 23 hits at 



Eael Yggcar 



1: 


74 


3: 


74 


4: 


9: 


74 


10: 


74 


11: 


33: 


74 


34: 


74 


37: 


41: 


74 


42: 


74 


45: 



There are 23 hits at 

EagI Cggccg 
35 1: 74 3: 74 4: 

9: 74 10: 74 11: 



58/132 

17 

64 4: 64 5: 64 6: 64 
64 10: 64 11: 64 16: 57 
57 48: 67 49: 67 
base# 64 
base# 57 

base# 67 Could be ragged. 





44 










72 4: 


72 


5: 


72 


6: 


72 


72 10: 


72 


11: 


72 


15: 


72 


72 21: 


72 


23: 


72 


24: 


72 


72 29: 


72 


30 : 


72 


31 : 


72 


72 35: 


72 


36: 


72 


37 : 


72 


72 41: 


72 


42: 


72 


43 : 


72 


72 47: 


72 


48: 


72 


49 : 


72 


base# 72 












23 










74 5: 


74 


7 : 


74 


8 : 


74 


74 17: 


74 


22: 


74 


30 : 


74 


74 38: 


74 


39: 


74 


40: 


74 


74 46: 


74 


47: 


74 






base# 74 












23 










74 5: 


74 


7: 


74 


8: 


74 


74 17: 


74 


22: 


74 


30: 


74 


74 38: 


74 


39: 


74 


40: 


74 


74 46: 


74 


47: 


74 






base* 74 












23 










74 5: 


74 


7: 


74 


8: 


74 


74 17: 


74 


22: 


74 


30: 


74 



PCT/USO 1/1 2454 
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33: 74 34: 74 37: 74 38: 74 39: 74 40: 74 
41: 74 42: 74 45: 74 46: 74 47: 74 
There are 23 hits at base# 74 



Haelll GGcc 27 



1: 


75 


3: 


75 


4: 


75 


5: 


75 


7: 


75 


8: 


75 


9: 


75 


10: 


75 


11: 


75 


16: 


75 


17: 


75 


20: 


75 


22: 


75 


30: 


75 


33: 


75 


34: 


75 


37: 


75 


38: 


75 


39: 


75 


40: 


75 


41: 


75 


42: 


75 


45: 


75 


46: 


75 


47: 


75 


48: 


63 


49: 


63 














There aj 


re 25 hits at 


base# 75 












Bst4CI ACNgt 


65°C 




63 


Sites Tl 


lere 


is a 


third isoschismer 


1: 


86 


2: 


86 


3: 


86 


4: 


86 


5: 


66 


6: 


86 


7: 


34 


7: 


86 


8: 


86 


9: 


86 


10: 


86 


11: 


86 


12: 


86 


13: 


86 


14: 


86 


15: 


36 


15: 


86 


16: 


53 


16: 


86 


17: 


36 


17: 


86 


18: 


86 


19: 


86 


20: 


53 


20: 


86 


21: 


36 


21: 


86 


22: 


0 


22: 


86 


23: 


86 


24: 


86 


25: 


86 


26: 


86 


27: 


53 


27: 


86 


28: 


36 


28: 


86 


29: 


86 


30: 


86 


31: 


86 


32: 


86 


33: 


36 


33: 


86 


34: 


86 


35: 


53 


35: 


86 


36: 


86 


37: 


86 


38: 


86 


39: 


86 


40: 


86 


41: 


86 


42: 


86 


43: 


86 


44: 


86 


45: 


86 


46: 


86 


47: 


86 


48: 


86 


49: 


86 


50: 


86 


51: 


0 


51: 


86 















There are 51 hits at base! 66 All the other sites are well away 



HpyCH4ZXI ACNgt 



1: 


86 


2: 


86 


7: 


34 


7: 


86 


12: 


86 


13: 


86 


16: 


86 


17: 


36 


20: 


86 


21: 


36 


24: 


86 


25: 


86 


28: 


86 


29: 


86 


33: 


86 


34: 


86 


38: 


86 


39: 


86 



63 



3: 


86 


4: 


86 


8: 


86 


9: 


86 


14: 


86 


15: 


36 


17: 


86 


18: 


86 


21: 


86 


22: 


0 


26: 


86 


27: 


53 


30: 


86 


31: 


86 


35: 


53 


35: 


86 


40: 


86 


41: 


86 



5: 


86 


6: 


86 


10: 


86 


11: 


86 


15: 


86 


16: 


53 


19: 


86 


20: 


53 


22: 


86 


23: 


86 


27: 


86 


28: 


36 


32: 


86 


33: 


36 


36: 


86 


37: 


86 


42: 


86 


43: 


86 
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44: 86 45: 86 46: 86 47: 86 48: 86 49: 86 
50: 86 51: 0 51: 86 
There axe 51 hits at base# 86 



5 Hinfl Gantc 43 



2: 


2 


3: 


2 


4: 


2 


5: 


2 


6: 


2 


7: 


2 


8: 


2 


9: 


2 


9: 


22 


10: 


2 


11: 


2 


15: 


2 


16: 


2 


17: 


2 


18: 


2 


19: 


2 


19: 


22 


20: 


2 


21: 


2 


23: 


2 


24: 


2 


25: 


2 


26: 


2 


27: 


2 


28: 


2 


29: 


2 


30: 


2 


31: 


2 


32: 


2 


33: 


2 


33: 


22 


34: 


22 


35: 


2 


36: 


2 


37: 


2 


38: 


2 


40: 


2 


43: 


2 


44: 


2 


45: 


2 


46: 


2 


47: 


2 



50: 60 

There are 38 hits at base# 2 

15 



Mlyl GAGTCNNNNNn 18 



2: 


2 


3: 


2 


4: 


2 


5: 


2 


6: 


2 


7: 


2 


8: 


2 


9: 


2 


10: 


2 


11: 


2 


37: 


2 


38: 


2 


40: 


2 


43: 


2 


44: 


2 


45: 


2 


46: 


2 


47: 


2 



20 . There are 18 hits at base# 2 



Plel gagtc 18 



2:2 3 
8:2 9 
25 40: 2 43 



2 4: 2 5: 2 6: 2 7: 2 
2 10: 2 11: 2 37: 2 38: 2 
2 44: 2 45: 2 46: 2 47: 2 
There are 18 hits at base! 2 

Acil Ccgc 24 

2: 26 9: 14 10: 14 11: 14 27: 74 37: 62 

37: 65 38: 62 39: 65 40: 62 40: 65 41: 65 
30 42: 65 43: 62 43: 65 44: 62 44: 65 45: 62 

46: 62 47: 62 47: 65 48: 35 48: 74 49: 74 

There are 8 hits at base# 62 

There are 8 hits at base# 65 

There are 3 hits at base# 14 
35 There are 3 hits at base# 74 

There are 1 hits at base# 26 

There are 1 hits at base# 35 
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-"- Gcgg 11 

8: 91 9: 16 10: 16 11: 16 37: 67 39: 67 
40: 67 42: 67 43: 67 45: 67 46: 67 
There are 7 hits at base# 67 
5 There are 3 hits at base# 16 
There are 1 hits at base# 91 

BsiHKAJ GWGCWc 20 

2: 30 4: 30 6: 30 7: 30 9: 30 10: 30 
10 12: 89 13: 89 14: 89 37: 51 38: 51 39: 51 
40: 51 41: 51 42: 51 43: 51 44: 51 45: 51 
46: 51 47: 51 

There are 11 hits at base# 51 

15 Bspl286I GDGCHc 20 

2: 30 4: 30 6: 30 7: 30 9: 30 10: 30 
12: 89 13: 89 14: 89 37: 51 38: 51 39: 51 
40: 51 41: 51 42: 51 43: 51 44: 51 45: 51 
46: 51 47: 51 

20 There are 11 hits at base# 51 

HgiAI GWGCWc 20 

2: 30 4: 30 6: 30 7: 30 9: 30 10: 30 
12: 89 13: 89 14: 89 37: 51 38: 51 39: 51 
25 40: 51 41: 51 42: 51 43: 51 44: 51 45: 51 
46: 51 47: 51 

There are 11 hits at base# 51 

BsoFI GCngc 26 
30 2: 53 3: 53 5: 53 6: 53 7: 53 8: 53 

8: 91 9: 53 10: 53 11: 53 31: 53 36: 36 
37: 64 39: 64 40: 64 41: 64 42: 64 43: 64 
44: 64 45: 64 46: 64 47: 64 48: 53 49: 53 
50: 45 51: 53 
35 There are 13 hits at base# 53 
There are 10 hits at base# 64 

Tsel Gcwgc 17 

2: 53 3: 53 5: 53 6: 53 7: 53 8: 53 
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9: 53 10: 53 11: 53 31: 53 36: 36 45: 64 
46: 64 48: 53 49: 53 50: 45 51: 53 
There are 13 hits at base# 53 



5 Mall gagg 34 



3: 


67 


3: 


95 


4: 


51 


5: 


16 


5: 


67 


6: 


67 


7: 


67 


8: 


67 


9: 


67 


10: 


67 


11: 


67 


15: 


67 


16: 


67 


17: 


67 


19: 


67 


20: 


67 


21: 


67 


22: 


67 


23: 


67 


24: 


67 


25: 


67 


26: 


67 


27: 


67 


28: 


67 


29: 


67 


30: 


67 


31: 


67 


32: 


67 


33: 


67 


34: 


67 


35: 


67 


36: 


67 


50: 


67 


51: 


67 











There are 31 hits at base# 67 



HpyCH4V TGoa 34 



5: 


90 


6: 


90 


11: 


90 


12: 


90 


13: 


90 


14: 


90 


15: 


44 


16: 


44 


16: 


90 


17: 


44 


18: 


90 


19: 


44 


20: 


44 


21: 


44 


22: 


44 


23: 


44 


24: 


44 


25: 


44 


26: 


44 


27: 


44 


27: 


90 


28: 


44 


29: 


44 


33: 


44 


34: 


44 


35: 


44 


35: 


90 


36: 


38 


48: 


44 


49: 


44 


50: 


44 


50: 


90 


51: 


44 


51: 


52 











There are 21 hits at base# 44 

There are 1 hits at base# 52 



AccI GTmkac 13 5-base recognition 

25 7: 37 11: 24 37: 16 38: 16 39: 16 40: 16 

41: 16 42: 16 43: 16 44: 16 45: 16 46: 16 
47: 16 

There are 11 hits at base# 16 



30 SacII CCGCgg 8 6-base recognition 

9: 14 10: 14 11: 14 37: 65 39: 65 40: 65 
42: 65 43: 65 

There are 5 hits at base# 65 
There are 3 hits at base# 14 



35 



Tfil Gawtc 24 

9: 22 15: 2 16: 2 17: 2 18: 2 19: 2 
19: 22 20: 2 21: 2 23: 2 24: 2 25: 2 
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26: 2 27: 2 28: 2 29: 2 30: 2 31: 2 

32: 2 33: 2 33: 22 34: 22 35: 2 36: 2 

There are 20 hits at base# 2 

5 BsmAI Nnnnnngagac 19 

15: 11 16: 11 20: 11 21: 11 22: 11 23: 11 

24: 11 25: 11 26: 11 27: 11 28: 11 28: 56 

30: 11 31: 11 32: 11 35: 11 36: 11 44: 87 
48: 87 

10 There are 16 hits at base# 11 

Bpml ctccag 19 

15: 12 16: 12 17: 12 18: 12 20: 12 21: 12 

22: 12 23: 12 24: 12 25: 12 26: 12 27: 12 

75 28: 12 30: 12 31: 12 32: 12 34: 12 35: 12 
36: 12 

There are 19 hits at basef 12 

XmnI GAANNnnttc 12 

20 37: 30 38: 30 39: 30 40: 30 41: 30 42: 30 

43: 30 44: 30 45: 30 46: 30 47: 30 50: 30 
There are 12 hits at basef 30 

BsrI NCcagt 12 

25 37: 32 38: 32 39: 32 40: 32 41: 32 42: 32 

43: 32 44: 32 45: 32 46: 32 47: 32 50: 32 

There are 12 hits at base# 32 

Banll GRGCYc 11 

30 37: 51 38: 51 39: 51 40: 51 41: 51 42: 51 

43: 51 44: 51 45: 51 46: 51 47: 51 

There are 11 hits at base# 51 

Ecll36I GAGctc 11 

35 37: 51 38: 51 39: 51 40: 51 41: 51 42: 51 

43: 51 44: 51 45: 51 46: 51 47: 51 

There are 11 hits at base# 51 



SacI GAGCTc 



11 
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37: 51 38: 51 39: 51 40: 51 41: 51 42: 51 
43: 51 44: 51 45: 51 46: 51 47: 51 
There are 11 hits at base# 51 
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Table 206: Synthetic 3-23 FR3 of human heavy chains showning positions of possible cleavage sites 

! Sites engineered into the synthetic gene are shown in upper case DNA 

! with the RE name between vertical bars (as in | Xbal | ) • 

! RERSs frequently found in GLGs are shown below the synthetic sequence 

! with the name to the right (as in gtn ac=MaeIII (24) , indicating that 

! 24 of the 51 GLGs contain the site). 



I FR3 

89 90 (codon # in 
R F synthetic 3-23) 
I cgc [ ttc i 6 

Allowed DNA |cgn|tty| 

lagrl 

ga ntc - Hinfl (38) 
ga gtc « Plel(18) 
ga wtc = Tfil(20) 

gtn ac « Maelll (24) 
gts ac = Tsp45I{21) 
tc ace - HphI (44) 

-FR3 

91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
TISRDNSKNTLYLQM 
I act | ate | TCT | AGA | gac | aac | tct | aag | aat I act I etc I tac | ttg I cag I atg | 5 1 
allowed I acn I ath I ten I cgn j gay | aay | ten | aar | aay | acn I ttr | tay 1 1 tr j car j atg j 
lagylagrl lagyl |ctn| |ctn| 

I ga| gac = BsmAI(16) ag ct = Alul (23) 

cjtcc ag = Bpml(19) g ctn age » BlpI (21) 

I I g aan nnn ttc = Xmnl(12) 

! Xbal | tgca» HpyCH4V(21) 



— FR3- 



>, 

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
NSLRAEDTAVYYCAK 
| aac I agC I TTAI AGgl get | gag | gac | aCT | GCA| Gtc I tac | tat I tgc | get I aaa | 
allowed I aay | ten | ttr I cgn I gen j gar j gay j acn | gen j gtn j tay j tay I tgy | gen I aar I 
|agy|ctn|agr| | | 



96 



! 1 I cc 


ring g - BsaJI{23) ac ngt 


- Bst4CI(51) 


! I aga tct 


= Bglll(lO) | ac ngt 


- HpyCH4IXX(51) 


! I Rga tcY 


= BstYI(ll) | ac ngt 


= Taal(51) 




o ayn nnn rtc = MslI (44) 






eg rye g = BsiEI(23) 






yg gee r =» Eael(23) 






eg gee g = EagI (23) 






|g gee = HaeIII(25) 






gag g « Mnll (31) | 




i lAflll | 


1 PstI | 
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Table 217: Human HC GLG FR1 Sequences 

VH Exon - Nucleotide sequence alignment 

VH1 



1- 


02 


CAG 


GTG 


CAG 


CTG 


GTG 


CAG 


TCT 


GGG 


GCT 






GTC 


TCC 


TGC 


AAG 


GCT 


TCT 


GGA 


TAC 


ACC 


1-03 


cag 


gtc 


cag 


ctT 


gtg 


cag 


tct 


ggg 


get 






gtT 


tec 


tgc 


aag 


get 


tct 


gga 


tac 


acc 


1-08 


cag 


gtg 


cag 


ctg 


gtg 


cag 


tct 


ggg 


get 






gtc 


tec 


tgc 


aag 


get 


tct 


gga 


tac 


acc 


1- 


-18 


cag 


gtT 


cag 


ctg 


gtg 


cag 


tct 


ggA 


get 






gtc 


tec 


tgc 


aag 


get 


tct 


ggT 


tac 


acc 


1- 


24 


cag 


gtC 


cag 


ctg 


gtA 


cag 


tct 


ggg 


get 






gtc 


tec 


tgc 


aag 


gTt 


tcC 


gga 


tac 


acc 


1- 


•45 


cag 


Atg 


cag 


ctg 


gtg 


cag 


tct 


ggg 


get 






gtT 


tec 


tgc 


aag 


get 


tcC 


gga 


tac 


acc 


1- 


-46 


cag 


gtg 


cag 


ctg 


gtg 


cag 


tct 


ggg 


get 






gtT 


tec 


tgc 


aag 


gcA 


tct 


gga 


tac 


acc 


1- 


•58 


caA 


Atg 


cag 


ctg 


gtg 


cag 


tct 


ggg 


Cct 






gtc 


tec 


tgc 


aag 


get 


tct 


gga 


tTc 


acc 


1- 


-69 


cag 


gtg 


cag 


ctg 


gtg 


cag 


tct 


ggg 


get 






gtc 


tec 


tgc 


aag 


get 


tct 


gga 


GGc 


acc 


1- 


•e 


cag 


gtg 


cag 


ctg 


gtg 


cag 


tct 


T9 Tf 


act 






gtc 


tec 


tgc 


aag 


get 


tct 


gga 


GGc 


acc 


1- 


•f 


Gag 


gtC 


cag 


ctg 


gtA 


cag 


tct 


ggg 


get 






Ate 


tec 


tgc 


aag 


gTt 


tct 


gga 


tac 


acc 


VH2 




















2- 


•05 


CAG 


ATC 


ACC 


TTG 


AAG 


GAG 


TCT 


GGT 


CCT 






CTG 


ACC 


TGC 


ACC 


TTC 


TCT 


GGG 


TTC 


TCA 


2- 


•26 


cag 


Gtc 


ace 


ttg 


aag 


gag 


tct 


ggt 


cct 






ctg 


ace 


tgc 


ace 


Gtc 


tct 


ggg 


ttc 


tea 


2- 


70 


cag 


Gtc 


ace 


ttg 


aag 


gag 


tct 


ggt 


cct 






ctg 


ace 


tgc 


ace 


ttc 


tct 


ggg 


ttc 


tea 


VH3 




















3- 


07 


GAG 


GTG 


CAG 


CTG 


GTG 


GAG 


TCT 


GGG 


GGA 






CTC 


TCC 


TGT 


GCA 


GCC 


TCT 


GGA 


TTC 


ACC 


3- 


■09 


gaA 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


3-11 


Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


3- 


13 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


3- 


15 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acT 


3- 


20 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 



GAG GTG AAG AAG CCT GGG GCC TCA GTG AAG 
TTC ACC 

gag gtg aag aag cct ggg gee tea gtg aag 
ttc acT 

gag gtg aag aag cct ggg gee tea gtg aag 
ttc acc 

gag gtg aag aag cct ggg gee tea gtg aag 
ttT acc 

gag gtg aag aag cct ggg gee tea gtg aag 
Ctc acT 

gag gtg aag aag Act ggg Tec tea gtg aag 
ttc acc 

gag gtg aag aag cct ggg gee tea gtg aag 
ttc acc 

gag gtg aag aag cct ggg Acc tea gtg aag 
ttT acT 

gag gtg aag aag cct ggg Tec tcG gtg aag 
ttc aGc 

gag gtg aag aag cct ggg Tec tcG gtg aag 
ttc aGc 

gag gtg aag aag cct ggg gcT Aca gtg aaA 
ttc acc 

ACG CTG GTG AAA CCC ACA CAG ACC CTC ACG 
CTC AGC 

GTg ctg gtg aaa ccc aca Gag acc ctc acg 
ctc age 

Gcg ctg gtg aaa ccc aca cag acc ctc acA 
ctc age 

GGC TTG GTC CAG CCT GGG GGG TCC CTG AGA 
TTT AGT 

ggc ttg gtA cag cct ggC Agg tec ctg aga 
ttt GAt 

ggc ttg gtc Aag cct ggA ggg tec ctg aga 
ttc agt 

ggc ttg gtA cag cct ggg ggg tec ctg aga 
ttc agt 

ggc ttg gtA Aag cct ggg ggg tec ctT aga 
ttc agt 

ggT Gtg gtA cGg cct ggg ggg tec ctg aga 
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etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttt 


GAt 












3- 


•21 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Ctg 


gtc Aag 


cct 


ggg ggg. tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 












3- 


23 


gag 


gtg 


cag 


ctg 


Ttg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtA cag 


cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttt 


agC 












3- 


•30 


Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg 


gtc cag 


cct 


ggg Agg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gec 


tct 


gga 


ttc 


ace 


ttc 


agt 












3- 


•30.3 Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg 


gtc cag 


cct 


ggg Agg 


tee 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 












3- 


•30.5 Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg 


gtc cag cct ggg Agg tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 












3- 


•33 


Cag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


Gtg 


gtc cag 


cct 


ggg Agg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gcG 


tct 


gga 


ttc 


ace 


ttc 


agt 












3- 


•43 


gaA 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


gTc 


Gtg 


gtA cag 


cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttt 


GAt 












3-48 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtA cag 


cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


ace 


ttc 


agt 












3- 


•49 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtA cag 


ccA ggg Cgg 


tec 


ctg aga 






etc 


tec 


tgt 


Aca 


gcT 


tct 


gga 


ttc 


ace 


ttt 


Ggt 












3- 


-53 


gag 


gtg 


cag 


ctg 


gtg 


gag 


Act 


ggA 


gga 


ggc 


ttg 


Ate cag 


cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


ggG 


ttc 


ace 


GtC 


agt 












3-64 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtc cag 


cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttc 


agt 












3- 


•66 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtc cag 


cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


GtC 


agt 












3- 


•72 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtc cag 


cct 


ggA ggg tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttc 


agt 












3- 


73 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


ggg 


gga 


ggc 


ttg 


gtc cag 


cct 


ggg ggg 


tec 


ctg aAa 






etc 


tec 


tgt 


gca 


gee 


tct 


ggG 


ttc 


acc 


ttc 


agt 












3- 


•74 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tcC 


ggg 


gga 


ggc 


ttA 


gtT cag 


cct 


ggg ggg 


tec 


ctg aga 






etc 


tec 


tgt 


gca 


gee 


tct 


gga 


ttc 


acc 


ttc 


agt 












3- 


•d 


gag 


gtg 


cag 


ctg 


gtg 


gag 


tct 


Cgg 


gga 


gTc 


ttg 


gtA cag 


cct 


ggg ggg 


tec 


ctg aga 


VH4 


etc 


tec 


tgt 


gca 


gec 


tct 


gga 


ttc 


acc 


GtC 


agt 












4-04 


CAG 


GTG 


CAG 


CTG 


CAG 


GAG 


TCG 


GGC 


CCA 


GGA 


CTG 


GTG AAG 


CCT 


TCG GGG ACC 


CTG TCC 






CTC 


ACC 


TGC 


GCT 


GTC 


TCT 


GGT 


GGC 


TCC 


ATC 


AGC 












4- 


•28 


cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag 


cct 


teg gAC 


acc 


ctg tec 






etc 


ace 


tgc 


get 


gtc 


tct 


ggt 


TAc 


tec 


ate 


age 












4- 


■30. 


1 cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag 


cct tcA CAg 


acc 


ctg tec 






etc 


ace 


tgc 


Act 


gtc 


tct 


ggt 


ggc 


tec 


ate 


age 












4- 


30. 


2 cag 


Ctg 


cag 


ctg 


cag 


gag 


tcC 


ggc 


Tea 


gga 


ctg 


gtg aag 


cct 


tcA CAg 


acc 


ctg tec 






etc 


ace 


tgc 


get 


gtc 


tct 


ggt 


ggc 


tec 


ate 


age 












4- 


30. 


4 cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag cct tcA CAg 


acc 


ctg tec 






etc 


ace 


tgc 


Act 


gtc 


tct 


ggt 


ggc 


tec 


ate 


age 
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4-31 


cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag 


cct 


tcA CAg 


acc 


ctg tec 




etc 


acc 


tgc 


Act 


gtc 


tct 


ggt 


ggc 


tec 


ate 


age 












4-34 


cag 


gtg 


cag 


ctA 


cag 


Cag 


tGg 


ggc 


Gca 


gga 


ctg 


Ttg aag 


cct 


teg gAg 


acc 


ctg tec 




etc 


acc 


tgc 


get 


gtc 


tAt 


ggt 


ggG 


tec 


Ttc 


agT 












4-39 


cag 


Ctg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag 


cct 


teg gAg 


acc 


ctg tec 




etc 


acc 


tgc 


Act 


gtc 


tct 


ggt 


ggc 


tec 


ate 


age 












4-59 


cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag 


cct 


teg gAg 


acc 


ctg tec 




etc 


acc 


tgc 


Act 


gtc 


tct 


ggt 


ggc 


tec 


ate 


agT 








- 




4-61 


cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag 


cct 


teg gAg 


acc 


ctg tec 




etc 


acc 


tgc 


Act 


gtc 


tct 


ggt 


ggc 


tec 


Gtc 


age 












4-b 


cag 


gtg 


cag 


ctg 


cag 


gag 


teg 


ggc 


cca 


gga 


ctg 


gtg aag 


cct 


teg gAg 


acc 


ctg tec 




etc 


acc 


tgc 


get 


gtc 


tct 


ggt 


TAC 


tec 


ate 


age 












VH5 


































5-51 


GAG 


GTG 


CAG 


CTG 


GTG 


CAG 


TCT 


GGA 


GCA 


GAG 


GTG 


AAA AAG 


CCC 


GGG GAG 


TCT 


CTG AAG 




ATC 


TCC 


TGT 


AAG 


GGT 


TCT 


GGA 


TAC 


AGC 


TTT 


ACC 












5-a 


gaA 


gtg 


cag 


ctg 


gtg 


cag 


tct 


gga 


gca 


gag 


gtg 


aaa aag 


ccc 


ggg gag 


tct 


ctg aGg 




ate 


tec 


tgt 


aag 


ggt 


tct 


gga 


tac 


age 


ttt 


acc 












VH6 


































6-1 


CAG 


GTA 


CAG 


CTG 


CAG 


CAG 


TCA 


GGT 


CCA 


GGA 


CTG 


GTG AAG 


CCC 


TCG CAG ACC 


CTC TCA 




CTC 


ACC 


TGT 


GCC 


ATC 


TCC 


GGG 


GAC 


AGT 


GTC 


TCT 












VH7 


































7-4.1 


CAG 


GTG 


CAG 


CTG 


GTG 


CAA 


TCT 


GGG 


TCT 


GAG 


TTG 


AAG AAG 


CCT 


GGG GCC 


TCA 


GTG AAG 




GTT 


TCC 


TGC 


AAG 


GCT 


TCT 


GGA 


TAC 


ACC 


TTC 


ACT 
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Table 220: RERS sites in Human HC GLG FRls where there are at least 20 GLGs cut 



Bscrl 


GTGCAG 










71 


(cuts 


16/14 


bases to 


1: 


4 


1: 


13 


2 : 


13 


3 : 


4 


3 : 


13 


4: 


13 


6: 


13 


7: 


4 


7: 


13 


8: 


13 


9: 


4 


9: 


13 


10: 


4 


10: 


13 


15 : 


4 


15: 


65 


16: 


4 


X u • 


65 


x / : 


4 


17: 


Oj 


lo : 


A 

4 


18 : 


c c 
OD 


iy : 


4 


19: 


65 


20: 


4 


20: 


65 


21: 


4 


21: 


65 


22: 


4 


22: 


65 


23: 


4 


23: 


65 


24: 


4 


24: 


65 


25: 


4 


25: 


65 


26: 


4 


26: 


65 


27: 


4 


27: 


65 


28: 


4 


28: 


65 


29: 


4 


30: 


4 


30: 


65 


31: 


4 


31: 


65 


32: 


4 


32: 


65 


33: 


4 


33: 


65 


34: 


4 


34: 


65 


35: 


4 


35: 


65 


36: 


4 


36: 


65 


37: 


4 


38: 


4 


39: 


4 


41: 


4 


42: 


4 


43: 


4 


45: 


4 


46: 


4 


47: 


4 


48: 


4 


48: 


13 


49: 


4 


49: 


13 


51: 


4 







15 There are 39 hits at base# 4 

There are 21 hits at base# 65 



_ ll_ 


ctgcac 










9 










12: 


63 


13: 


63 


14: 


63 


39: 


63 


41: 


63 


42: 


63 


44: 


63 


45: 


63 


46: 


63 














Bbvl 


GCAGC 










65 










1: 


6 


3: 


6 


6: 


6 


7: 


6 


8: 


6 


9: 


6 


10: 


6 


15: 


6 


15:. 


67 


16: 


6 


16: 


67 


17: 


6 


17: 


67 


18: 


6 


18: 


67 


19: 


6 


19: 


67 


20: 


6 


20: 


67 


21: 


6 


21: 


67 


22: 


6 


22: 


67 


23: 


6 


23: 


67 


24: 


6 


24: 


67 


25: 


6 


25: 


67 


26: 


6 


26: 


67 


27: 


6 


27: 


67 


28: 


6 


28: 


67 


29: 


6 


30: 


6 


30: 


67 


31: 


6 


31: 


67 


32: 


6 


32: 


67 


33: 


6 


33: 


67 


34: 


6 


34: 


67 


35: 


6 


35: 


67 


36: 


6 


36: 


67 


37: 


6 


38: 


6 


39: 


6 


40: 


6 


41: 


6 


42: 


6 


43: 


6 


44: 


6 


45: 


6 


46: 


6 


47: 


6 


48: 


6 


49: 


6 


50: 


12 


51: 


6 






There arc 


i 43 


hits at 


base# 6 


Bolded sites 


very 


net 



listed below 

35 There are 21 hits at base# 67 

gctgc 13 
37: 9 38: 9 39: 9 40: 3 40: 9 41: 9 
42: 9 44: 3 44: 9 45: 9 46: 9 47: 9 



WO 01/79481 



70/132 



PCT/US01/12454 



50: 9 

There axe 11 hits at basef 



BsoFI GCngc 



78 



10 



15 



20 



1: 


6 


3: 


6 


6: 


6 


7: 


6 


8: 


6 


9: 


6 


in. 


o 


i ^ • 

J-O l 


c 
O 


10 : 


o / 


io: 


O 


lb! 


an 


x / : 


D 


17: 


67 


18 : 


vj 




fn 
\j $ 


1 Q • 


6 


19: 


67 




u 


20: 


67 


21: 


6 


21: 


67 


22 : 


6 


22: 


67 


23: 


6 


23: 


67 


24: 


6 


24: 


67 


25: 


6 


25: 


67 


26: 


6 


26: 


67 


27: 


D 


9*7 • 




op. 


6 


28: 


67 


OQ. 


O 


30: 


6 


30: 


67 


31: 


6 


31: 


67 


32: 


6 


32: 


67 


33: 


6 


33: 


67 


34: 


6 


34: 


67 


35: 


6 


35: 


67 


36: 


6 


36: 


67 


37: 


6 


37: 


9 


38: 


6 


38: 


9 


39: 


6 


39: 


9 


40: 


3 


40: 


6 


40: 


9 


41: 


6 


41: 


9 


42: 


6 


42: 


9 


43: 


6 


44: 


3 


44: 


6 


44: 


9 


45: 


6 


45: 


9 


46: 


6 


46: 


9 


47: 


6 


47: 


9 


48: 


6 


49: 


6 


50: 


9 


50: 


12 


51: 


6 


There are 


» 43 


hits 


at 


base 


6 


These often c 


>ccux 


togel 


There arc 


i 11 


hits 


at 


basefl 


9 












There are 2 


hits 


at 


based 


3 












There are 


21 


hits 


at 


base# 


67 













Tsel Gcwgc 78 



1: 


6 


3: 


6 


6: 


6 


7: 


6 


8: 


6 


9: 


6 


10: 


6 


15: 


6 


15: 


67 


16: 


6 


16: 


67 


17: 


6 


17: 


67 


18: 


6 


18: 


67 


19: 


6 


19: 


67 


20: 


6 


20: 


67 


21: 


6 


21: 


67 


22: 


6 


22: 


67 


23: 


6 


23: 


67 


24: 


6 


24: 


67 


25: 


6 


25: 


67 


26: 


6 


26: 


67 


27: 


6 


27: 


67 


28: 


6 


28: 


67 


29: 


6 


30: 


6 


30: 


67 


31: 


6 


31: 


67 


32: 


6 


32: 


67 


33: 


6 


33: 


67 


34: 


6 


34: 


67 


35: 


6 


35: 


67 


36: 


6 


36: 


67 


37: 


6 


37: 




38: 


6 


38: 


9 


39: 


6 


39: 




40: 


3 


40: 


6 


40: 




41: 


6 


41: 


9 


42: 


6 


42: 


9 


43: 


6 


44: 


3 


44: 


6 


44: 


9 


45: 


6 


45: 


9 


46: 


6 


46: 


9 


47: 


6 


47: 


9 


48: 


6 


49: 


6 


50: 


9 


50: 


12 


51: 


6 



There are 43 hits at base! 6 Often together. 
There are 11 hits at base! 9 
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There are 2 hits at base! 3 
There are 1 hits at base# 12 

There are 21 hits at base# 67 

5 MspAlI CMGckg 48 



1: 


7 


3: 


7 


4: 


7 


5: 


7 


6: 


7 


7: 


7 


8: 


7 


9: 


7 


10: 


7 


11: 


7 


15: 


7 


16: 


7 


17: 


7 


18: 


7 


19: 


7 


20: 


7 


21: 


7 


22: 


7 


23: 


7 


24: 


7 


25: 


7 


.26: 


7 


27: 


7 


28: 


7 


29: 


7 


30: 


7 


31: 


7 


32: 


7 


33: 


7 


34: 


7 


35: 


7 


36: 


7 


37: 


7 


38: 


7 


39: 


7 


40: 


1 


40: 


7 


41: 


7 


42: 


7 


44: 


1 


44: 


7 


45: 


7 


46: 


7 


47: 


7 


48: 


7 


49: 


7 


50: 


7 


51: 


7 



There are 46 hits at base! 7 

75 



PvuII CAGctg 48 



1: 


7 


3: 


7 


4: 


7 


5: 


7 


6: 


7 


7: 


7 


8: 


7 


9: 


7 


10: 


7 


11: 


7 


15: 


7 


16: 


7 


17: 


7 


18: 


7 


19: 


7 


20: 


7 


21: 


7 


22: 


7 


23: 


7 


24: 


7 


25: 


7 


26: 


7 


27: 


7 


28: 


7 


29: 


7 


30: 


7 


31: 


7 


32: 


7 


33: 


7 


34: 


7 


35: 


7 


36: 


7 


37: 


7 


38: 


7 


39: 


7 


40: 


1 


40: 


7 


41: 


7 


42: 


7 


44: 


1 


44: 


7 


45: 


7 


46: 


7 


47: 


7 


48: 


7 


49: 


7 


50: 


7 


51: 


7 



25 There are 46 hits at basef 7 
There are 2 hits at base# 1 



Alul AGct 54 



1: 


8 


2: 


8 


3: 


8 


4: 


8 


4: 


24 


5: 


8 


6: 


8 


7: 


8 


8: 


8 


9: 


8 


10: 


8 


11: 


8 


15: 


8 


16: 


8 


17: 


8 


18: 


8 


19: 


8 


20: 


8 


21: 


8 


22: 


8 


23: 


8 


24: 


8 


25: 


8 


26: 


8 


27: 


8 


28: 


8 


29: 


8 


29: 


69 


30: 


8 


31: 


8 


32: 


8 


33: 


8 


34: 


8 


35: 


8 


36: 


8 


37: 


8 


38: 


8 


39: 


8 


40: 


2 


40: 


8 


41: 


8 


42: 


8 


43: 


8 


44: 


2 


44: 


8 


45: 


8 


46: 


8 


47: 


8 


48: 


8 


48: 


82 


49: 


8 


49: 


82 


50: 


8 


51: 


8 



There are 48 hits at basef 8 
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There are 2 hits at base# 2 



Ddel Ctnag 48 



1: 


26 


1: 


48 


2: 


26 


2: 


48 


3: 


26 


3: 


48 


4: 


26 


4: 


48 


5: 


26 


5: 


48 


6: 


26 


6: 


48 


7: 


26 


7: 


48 


8: 


26 


8: 


48 


9: 


26 


10: 


26 


11: 


26 


12: 


85 


13: 


85 


14: 


85 


15: 


52 


16: 


52 


17: 


52 


18: 


52 


19: 


52 


20: 


52 


21: 


52 


22: 


52 


23: 


52 


24: 


52 


25: 


52 


26: 


52 


27: 


52 


28: 


52 


29: 


52 


30: 


52 


31: 


52 


32: 


52 


33: 


52 


35: 


30 


35: 


52 


36: 


52 


40: 


24 


49: 


52 


51: 


26 


51: 


48 



There axe 22 hits at base# 52 52 and 48 never together. 

There are 9 hits at base# 48 

There are 12 hits at base# 26 26 and 24 never together. 

15 



HphI tcacc 42 



1: 


86 


3: 


86 


6: 


86 


7: 


86 


8: 


80 


11: 


86 


12: 


5 


13: 


5 


14: 


5 


15: 


80 


• 16: 


80 


17: 


80 


18: 


80 


20: 


80 


21: 


80 


22: 


80 


23: 


80 


24: 


80 


25: 


80 


26: 


80 


27: 


80 


28: 


80 


29: 


80 


30: 


80 


31: 


80 


32: 


80 


33: 


80 


34: 


60 


35: 


80 


36: 


80 


37: 


59 


38: 


59 


39: 


59 


40: 


59 


41: 


59 


42: 


59 


43: 


59 


44: 


59 


45: 


59 


46: 


59 


47: 


59 


50: 


59 



There are 22 hits at base# 80 80 and 86 never together 
25 There are 5 hits at base! 86 
There are 12 hits at base# 59 



BssKI Nccngg 50 





1: 


39 


2: 


39 


3: 


39 


4: 


39 


5: 


39 


7: 


39 


30 


8: 


39 


9: 


39 


10: 


39 


11: 


39 


15: 


39 


16: 


39 




17: 


39 


18: 


39 


19: 


39 


20: 


39 


21: 


29 


21: 


39 




22: 


39 


23: 


39 


24: 


39 


25: 


39 


26: 


39 


27: 


39 




28: 


39 


29: 


39 


30: 


39 


31: 


39 


32: 


39 


33: 


39 




34: 


39 


35: 


19 


35: 


39 


36: 


39 


37: 


24 


38: 


24 


35 


39: 


24 


41: 


24 


42: 


24 


44: 


24 


45: 


24 


46: 


24 




47: 


24 


48: 


39 


48: 


40 


49: 


39 


49: 


40 


50: 


24 




50: 


73 


51: 


39 



















There are 35 hits at base# 39 39 and 40 together twice. 
There are 2 hits at base# 40 
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BsaJI Ccnngg 
1: 40 2: 40 



10 



15 



8: 40 
15: 40 
23: 40 
29: 40 
35: 40 
42: 24 
48: 41 
There 
There are 
There are 
There 



9: 40 

18: 40 

24: 40 

30: 40 

36: 40 

44: 24 

49: 40 



3: 
9: 
19: 
25: 
31: 
37: 
45: 
49: 



40 
47 
40 
40 
40 
24 
24 
41 



32 hits at 
2 hits at 

9 hits at 
2 hits at 



BstNI CCwgg 
PspGI ccwgg 
ScrFI ($M.HpaII) CCwgg 



4: 
10: 
20; 
26: 
32 
38: 
46: 

50: 

base# 40 40 
base! 41 
base# 24 
baseft 47 

44 



47 
40 
40 
40 
40 
40 
24 
24 
74 



5: 40 7: 40 

10: 47 11: 40 

21: 40 22: 40 

27: 40 28: 40 

34: 40 35: 20 

39: 24 41: 24 

47: 24 48: 40 

51: 40 
and 41 together twice 





1: 


40 


2: 


40 


3: 


40 


4: 


40 


5: 


40 


7: 


40 


20 


8: 


40 


9: 


40 


10: 


40 


11: 


40 


15: 


40 


16: 


40 




17: 


40 


18: 


40 


19: 


40 


20: 


40 


21: 


30 


21: 


40 




22: 


40 


23: 


40 


24: 


40 


25: 


40 


26: 


40 


27: 


40 




28: 


40 


29: 


40 


30: 


40 


31: 


40 


32: 


40 


33: 


40 




34: 


40 


35: 


40 


36: 


40 


37: 


25 


38: 


25 


39: 


25 


25 


41: 


25 


42: 


25 


44: 


25 


45: 


25 


46: 


25 


47: 


25 




50: 


25 


51: 


40 




















Thex 


re aj 


re 33 hits at 


base# 40 














ScrFI CCngg 










50 










30 


1: 


40 


2: 


40 


3: 


40 


4: 


40 


5: 


40 


7: 


40 




8: 


40 


9: 


40 


10: 


40 


11: 


40 


15: 


40 


16: 


40 




17: 


40 


18: 


40 


19: 


40 


20: 


40 


21: 


30 


21: 


40 




22: 


40 


23: 


40 


24: 


40 


25: 


40 


26: 


40 


27: 


40 




28: 


40 


29: 


40 


30: 


40 


31: 


40 


32: 


40 


33: 


40 


35 


34: 


40 


35: 


20 


35: 


40 


36: 


40 


37: 


25 


38: 


25 




39: 


25 


41: 


25 


42: 


25 


44: 


25 


45: 


25 


46: 


25 




47: 


25 


48: 


40 


48: 


41 


49: 


40 


49: 


41 


50: 


25 




50: 


74 


51: 


40 



















There are 35 hits at base# 40 
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There axe 2 hits at base# 41 



10 



15 



20 



25 



30 



35 



EcoO109I RGgnccy 
1: 43 2: 43 



7: 43 8: 43 
17: 46 18: 46 
23: 46 24: 46 
30: 46 31: 46 
36: 46 37: 46 
There are 22 hits at 
There are 11 hits at 
NlalV GGNncc 



3: 
9: 
19: 
25: 
32: 
43: 



1: 43 
7: 43 
15: 46 



2 
8 

15 



18: 47 19 



21: 47 
26: 47 
30: 46 



22 



27 



30 



33: 46 



33 



36: 46 



36 



43 
43 
47 



3 
9 
16 
-19 



46 



22 



46 



27 



47 



31 



47 



34 



-11 



38: 21 39 
42: 21 42 
45: 79 46 
There are 
There are 
Sau96I Ggncc 



21 
79 
21 

23 hits at 
17 hits at 



37 
39 
43 
46 



34 
4: 43 
10: 43 
20: 46 
26: 46 
33: 46 
51: 43 
base* 46 46 
base# 43 

71 
4: 43 
9: 79 
17: 46 



43 
43 
46 
46 
46 
79 



5: 43 

15: 46 

21: 46 

27: 46 

34: 46 



6: 43 

16: 46 

22: 46 

28: 46 

35: 46 



and 43 never together 



43 
43 
47 

.12 
47 
47 
46 



5: 43 6: 43 

10: 43 10: 79 

17: 47 18: 46 

20: 46 20: 47 21: 46 

23: 47 24: 47 

28: 46 28: 47 



31: 47 32: 46 



25: 47 
29: 47 
32: 47 



46 



34: 47 35: 46 35: 47 



21 
79 
79 
79 



37: 46 37: 47 



40: 
44: 
47: 



79 
21 
21 



37: 79 
41: 79 
45: 21 



base# 47 46 
base# 46 

70 



41: 21 
44: 79 
51: 43 
& 47 often together 



There 



11 hits at baseft 43 



1: 44 


2: 


3 


2: 


44 


3: 


44 


4: 


44 


5: 


3 


5: 


44 


6: 


44 


7: 44 


8: 


22 


8: 


44 


9: 


44 


10: 


44 


11: 


3 


12: 


22 


13: 


22 


14: 22 


15: 


33 


15: 


47 


16: 


47 


17: 


47 


18: 


47 


19: 


47 


20: 


47 


21: 47 


22: 


47 


23: 


33 


23: 


47 


24: 


33 


24: 


47 


25: 


33 


25: 


47 


26: 33 


26: 


47 


27: 


47 


28: 


47 


29: 


47 


30: 


47 


31: 


33 


31: 


47 


32: 33 


32: 


47 


33: 


33 


33: 


47 


34: 


33 


34: 


47 


35: 


47 


36: 


47 


37: 21 


37: 


22 


37: 


47 


38: 


21 


38: 


22 


39: 


21 


39: 


22 


41: 


21 


41: 22 


42: 


21 


42: 


22 


43: 


80 


44: 


21 


44: 


22 


45: 


21 


45: 


22 


46: 21 


46: 


22 


47: 


21 


47: 


22 


50: 


22 


51: 


44 










There ai 


re 23 


hits 


at 


baseH 


47 


These do 


not 


oocnu 


c together. 






There aj 


re 11 


hits 


at 


based 


44 





















WO 01/79481 



75/132 



PCT/US01/12454 



There are 14 hits at base# 22 These do occur together. 
There are 9 hits at base# 21 



BsmAI GTCTCNnnnn 22 



5 


1: 


58 


3: 58 


4: 


58 


5: 


58 


8: 


58 


9: 


58 




10: 


58 


13: 70 


36: 


18 


37: 


70 


38: 


70 


39: 


70 




40: 


70 


41: 70 


42: 


70 


44: 


70 


45: 


70 


46: 


70 




47: 


70 


48: 48 


49: 


48 


50: 


a c 
DO 












There are 11 hits at 


base# 70 












in 


























_ n _ 


Nnnnnngagac 








27 












13: 


40 


15: 48 


16: 


48 


17: 


48 


18: 


48 


20: 


48 




21: 


48 


22: 48 


23: 


48 


24: 


48 


25: 


48 


26: 


48 




27: 


48 


28: 48 


29: 


48 


30: 


10 


30: 


48 


31: 


48 


J J 


32: 


48 


33: 48 


35: 


48 


36: 


48 


43: 


40 


44: 


40 




45: 


40 


46: 40 


47: 


40 
















There « 


ire 20 hits at 


base# 48 














Avail Ggwcc 








44 










20 


Sau96I ($M-HaeIII) 


Ggwcc 




44 












2: 


3 


5: 3 


6: 


44 


8: 


44 


9: 


44 


10: 


44 




11: 


3 


12: 22 


13: 


22 


14: 


22 


15: 


33 


15: 


47 




16: 


47 


17: 47 


18: 


47 


19: 


47 


20: 


47 


21: 


47 




22: 


47 


23: 33 


23: 


47 


24: 


33 


24: 


47 


25: 


33 


25 


25: 


47 


26: 33 


26: 


47 


27: 


47 


28: 


47 


29: 


47 




30: 


47 


31: 33 


31: 


47 


32: 


33 


32: 


47 


33: 


33 




33: 


47 


34: 33 


34: 


47 


35: 


47 


36: 


47 


37: 


47 



43: 80 50: 22 

There are 23 hits at base# 47 44 & 47 never together 
30 There are 4 hits at base# 44 



PpuMI RGgwccy 27 



6: 


43 


8: 


43 


9: 


43 


10: 


43 


15: 


46 


16: 


46 


17: 


46 


18: 


46 


19: 


46 


20: 


46 


21: 


46 


22: 


46 


23: 


46 


24: 


46 


25: 


46 


26: 


46 


27: 


46 


28: 


46 


30: 


46 


31: 


46 


32: 


46 


33: 


46 


34: 


46 


35: 


46 


36: 


46 


37: 


46 


43: 


79 















There are 22 hits at basef 46 43 and 46 never occur together. 

There are 4 hits at base# 43 



WO 01/79481 PCT/USO 1/1 2454 

76/132 



BsmFI GGGAC 





8: 


43 


37: 


46 


50: 


77 


















gtccc 










oo 










5 


15: 


48 


16: 


48 


17: 


48 


1: 


0 


1: 


0 


20: 


48 




21: 


48 


22: 


48 


23: 


48 


24: 


48 


25: 


48 


26: 


48 




27: 


48 


28: 


48 


29: 


48 


30: 


48 


31: 


48 


32: 


48 




33: 


48 


34: 


48 


35: 


48 


36: 


48 


37: 


54 


38: 


54 




39: 


54 


40: 


54 


41: 


54 


42: 


54 


43: 


54 


44: 


54 


1/1 

JU 


45: 


54 


46: 


54 


47: 


54 
















There aj 


re 20 hits at 


base* 48 














There aj 


re 11 hits at 


base* 54 














Hinf I Gantc 










art 










u 


8: 


77 


12: 


16 


13: 


16 


14: 


lb 


id : 


lb 


IE, 

id : 


ob 




15: 


77 


16: 


16 


16: 


56 


16: 




1 "7 • 

i # : 


lb 


1 *7 • 

i / : 


Ob 




17: 


77 


18: 


16 


18: 


56 


18: 




1 Q. 

15* • 


lo 


i? : 


tic 
Ob 




19: 


77 


20: 


16 


20: 


56 


20: 


*7"7 


-6 1 I 


lb 


z,x : 


Ob 




21: 


77 


22: 


16 


22: 


56 


22: 


•7 "7 


9<3. 

: 


lb 


: 


Ob 


20 


23: 


77 


24: 


16 


24: 


56 


24: 


77 


25: 


16 


25: 


56 




25: 


77 


26: 


16 


26: 


56 


26: 


77 


27: 


16 


27: 


26 




27: 


56 


27: 


77 


28: 


16 


28: 


56 


28: 


77 


29: 


16 




29: 


56 


29: 


77 


30: 


56 


31: 


16 


31: 


56 


31: 


77 




32: 


16 


32: 


56 


32: 


77 


33: 


16 


33: 


56 


33: 


77 


25 


34: 


16 


35: 


16 


35: 


56 


35: 


77 


36: 


16 


36: 


26 




36: 


56 


36: 


77 


37: 


16 


38: 


16 


39: 


16 


40: 


16 




41: 


16 


42: 


16 


44: 


16 


45: 


16 


46: 


16 


47: 


16 




48: 


46 


49: 


46 



















30 



35 



There are 34 hits at base! 16 



Tfil Gawto 

8: 77 15: 77 16: 
20: 77 21: 77 22: 
26: 77 27: 77 28: 
33: 77 35: 77 36: 
There are 21 hits at 



21 

77 17: 77 
77 23: 77 
77 29: 77 
77 

base# 77 



18: 77 
24: 77 
31: 77 



19: 77 
25: 77 
32: 77 



WO 01/79481 PCT/US01/12454 

77/132 



Mlyl GAGTC 38 

12: 16 13: 16 14: 16 15: 16 16: 16 17: 16 

18: 16 19: 16 20: 16 21: 16 22: 16 23: 16 

24: 16 25: 16 26: 16 27: 16 27: 26 28: 16 

29: 16 31: 16 32: 16 33: 16 34: 16 35: 16 

36: 16 36: 26 37: 16 38: 16 39: 16 40: 16 

41: 16 42: 16 44: 16 45: 16 46: 16 47: 16 

48: 46 49: 46 

There are 34 hits at base# 16 

GACTC 21 

15: 56 16: 56 17: 56 18: 56 19: 56 20: 56 

21: 56 22: 56 23: 56 24: 56 25: 56 26: 56 

27: 56 28: 56 29: 56 30: 56 31: 56 32: 56 

15 33: 56 35: 56 36: 56 

There are 21 hits at base# 56 

Plel gagtc 38 



10 





12: 


16 


13: 


16 


14: 


16 


15: 


16 


16: 


16 


17: 


16 


20 


18: 


16 


19: 


16 


20: 


16 


21: 


16 


22: 


16 


23: 


16 




24: 


16 


25: 


16 


26: 


16 


27: 


16 


27: 


26 


28: 


16 




29: 


16 


31: 


16 


32: 


16 


33: 


16 


34: 


16 


35: 


16 




36: 


16 


36: 


26 


37: 


16 


38: 


16 


39: 


16 


40: 


16 




41: 


16 


42: 


16 


44: 


16 


45: 


16 


46: 


16 


47: 


16 


25 


48: 


46 


49: 


46 




















There arc 


s 34 hits at base# 16 














_ n 


gactc 








21 












15: 


56 


16: 


56 


17: 


56 


18: 


56 


19: 


56 


20: 


56 




21: 


56 


22: 


56 


23: 


56 


24: 


56 


25: 


56 


26: 


56 


30 


27: 


56 


28: 


56 


29: 


56 


30: 


56 


31: 


56 


32: 


56 




33: 


56 


35: 


56 


36: 


56 















There are 21 hits at base! 56 

AlwNI CAGNNNctg 26 
15: 68 16: 68 17: 68 18: 68 19: 68 20: 68 
35 21: 68 22: 68 23: 68 24: 68 25: 68 26: 68 
27: 68 28: 68 29: 68 30: 68 31: 68 32: 68 
33: 68 34: 68 35: 68 36: 68 39: 46 40: 46 
41: 46 42: 46 
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Seqs with both expected and unexpected. ... 8 
Seqs with no sites 0 



Analysis repeated using only 8 best REdaptors 

5 Id Ntot 01234567 8 + 

1 301 78 101 54 32 16 9 10 1 0 281 102#1 ccgtgtattactgtgcgagaga 

2 493 69 155 125 73 37 14 11 3 6 459 103#2 ctgtgtattactgtgcgagaga 

3 189 52 45 38 23 18 5 4 13 176 108#3 ccgtgtattactgtgcgagagg 

4 127 29 23 28 24 10 6 5 2 0 114 323#22 ccgtatattactgtgcgaaaga 
10 5 78 21 25 14 11 1 4 2 0 0 72 330#23 ctgtgtattactgtgcgaaaga 

6 79 15 17 25 8 11 12 0 0 76 439#44 ctgtgtattactgtgcgagaca 

7 43 14 15 5 5 3 0 1 0 0 42 551#48 ccatgtattactgtgcgagaca 

8 307 26 63 72 51 38 24 14 13 6 250 5a#49 ccatgtattactgtgcgaga 
1 102#1 ccgtgtattactgtgcgagaga ccgtgtattactgtgcgagaga 

15 2 103#2 ctgtgtattactgtgcgagaga .t.. 

3 108#3 ccgtgtattactgtgcgagagg g 

4 323#22 ccgtatattactgtgcgaaaga ....a a... 

5 330#23 ctgtgtattactgtgcgaaaga .t a... 

6 439#44 ctgtgtattactgtgcgagaca .t c. 

20 7 551#48 ccatgtattactgtgcgagaca -.a... c. 

8 5a#4 9 ccatgtattactgtgcgagaAA ..a AA 



Seqs with the expected RE site only 1463 / 1617 

Seqs with only an unexpected site 0 

25 Seqs with both expected and unexpected - ... 7 
Seqs with no sites 0 
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GCA TCT ACA GGA GAC AGA GTC 
GCC ATC CAG ATG ACC CAG TCT 
GCA TCT GTA GGA GAC AGA GTC 
GAC ATC CAG ATG ACC CAG TCT 
5 GCA TCT GTA GGA GAC AGA GTC 
GAT ATT GTG ATG ACC CAG ACT 
GTC ACC CCT GGA GAG CCG GCC 
GAT ATT GTG ATG ACC CAG ACT 
GTC ACC CCT GGA GAG CCG GCC 

10 GAT GTT GTG ATG ACT CAG TCT 
GTC ACC CTT GGA CAG CCG GCC 
GAT GTT GTG ATG ACT CAG TCT 
GTC ACC CTT GGA CAG CCG GCC 
GAT ATT GTG ATG ACC CAG ACT 

15 GTC ACC CCT GGA CAG CCG GCC 
GAT ATT GTG ATG ACC CAG ACT 
GTC ACC CCT GGA CAG CCG GCC 
GAT ATT GTG ATG ACT CAG TCT 
GTC ACC CCT GGA GAG CCG GCC 

20 GAT ATT GTG ATG ACT CAG TCT 
GTC ACC CCT GGA GAG CCG GCC 
GAT ATT GTG ATG ACC CAG ACT 
GTC ACC CTT GGA CAG CCG GCC 
GAA ATT GTG TTG ACG CAG TCT 

25 TTG TCT CCA GGG GAA AGA GCC 
GAA ATT GTG TTG ACG CAG TCT 
TTG TCT CCA GGG GAA AGA GCC 
GAA ATA GTG ATG ACG CAG TCT 
GTG TCT CCA GGG GAA AGA GCC 

30 GAA ATA GTG ATG ACG CAG TCT 
GTG TCT CCA GGG GAA AGA GCC 
GAA ATT GTG TTG ACA CAG TCT 
TTG TCT CCA GGG GAA AGA GCC 
GAA ATT GTG TTG ACA CAG TCT 

35 TTG TCT CCA GGG GAA AGA GCC 
GAA ATT GTA ATG ACA CAG TCT 
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gga 


ggg 








CAG 


ACT 


GTG 


GTG 








TCC 


CCT 


GGA 


GGG 



act 


cag 


cca 


CCC 


tea 


gtg 


tcA gtg 


acG 


gcc 


agG 


atT 


acc 


tgT 


! 3h 


acA 


cag 


cTa 


CCC 


tcG 


gtg 


tcA gtg 


aca 


gcc 


agG 


ate 


acc 


tgc 


! 3e 


aTG 


cag 


cca 


CCC 


tcG 


gtg 


tcA gtg 


acG 


gcc 


agG 


ate 


acc 


tgc 


! 3m 


acA 


cag 


cca 


Tec 


tea 


gtg 


tcA gtg 


aca 


gcc 


agG 


ate 


acc 


tgc 


! V2-19 


ACT 


CAG 


CCC 


CCG 


TCT 


GCA 


TCT GCC 


TCG 


ATC 


AAG 


CTC 


ACC 


TGC 


! 4c 


act 


caA 


TcA 


TcC 


tct 


gcc 


tct gcT 


teg 


Gtc 


aag 


etc 


acc 


tgc 


! 4a 


act 


caA 


TcG 


ccC 


tct 


gcc 


tct gcc 


teg 


Gtc 


aag 


etc 


acc 


tgc 


! 4b 


ACT 


CAG 


CCA 


CCT 


TCC 


TCC 


TCC GCA 


TCC 


GCC 


AGA 


CTC 


ACC 


TGC 


! 5e 


act 


cag 


ccG 


Get 


tee 


CTc 


tcT aca 


tcA 


gcc 


agT 


etc 


acc 


tgc 


! 5c 


act 


cag 


cca 


Tct 


tec 


CAT 


tcT gca 


tcA 


gTc 


aga 


etc 


acc 


tgc 


! 5b 


ACT 


CAG 


CCC 


CAC 


TCT 


GTG 


TCG GAG 


ACG 


GTA 


ACC 


ATC 


TCC 


TGC 


! 6a 


ACT 


CAG 


GAG 


CCC 


TCA 


CTG 


ACT GTG 


ACA 


GTC 


ACT 


CTC 


ACC 


TGT 


! 7a 


act 


cag 


gag 


CCC 


tea 


ctg 


act gtg 


aca 


gtc 


act 


etc 


acc 


tgt 


! 7b 


ACC 


CAG 


GAG 


CCA 


TCG 


TTC 


TCA GTG 


ACA 


GTC 


ACA 


CTC 


ACT 


TGT 


! 8a 
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VL9 

CAG CCT GTG CTG ACT CAG CCA CCT TCT GCA TCA GCC 

TCC CTG GGA GCC TCG GTC ACA CTC ACC TGC ! 9a 

VL10 

CAG GCA GGG CTG ACT CAG CCA CCC TCG GTG TCC AAG 

GGC TTG AGA CAG ACC GCC ACA CTC ACC TGC ! 10a 
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Table 405 RERSs found in human lambda FR1 GLGs 
! There are 31 lambda GLGs 
Mlyl NnnnnnGACTC 25 





1: 


6 


3: 


6 


4: 


6 


6: 


6 


7: 


6 


8: 


6 


5 


9: 


6 


10: 


6 


11: 


6 


12: 


6 


15: 


6 


16: 


6 




20: 


6 


21: 


6 


22: 


6 


23: 


6 


23: 


50 


24: 


6 




25: 


6 


25: 


50 


26: 


6 


27: 


6 


28: 


6 


30: 


6 - 



31: 6 

There are 23 hits at base# 6 

GAGTCNNNNNn 1 
26: 34 



Mwol GCNNNNNnngc 20 



15 



1: 


9 


2: 


9 


3: 


9 


4: 


9 


11: 


9 


11: 


56 


12: 


9 


13: 


9 


14: 


9 


16: 


9 


17: 


9 


18: 


9 


19: 


9 


20: 


9 


23: 


9 


24: 


9 


25: 


9 


26: 


9 


30: 


9 


31: 


9 



















There are 19 hits at base# 9 
20 Hinfl Gantc 27 



1: 


12 


3: 


12 


4: 


12 


6: 


12 


7: 


12 


8: 


12 


9: 


12 


10: 


12 


11: 


12 


12: 


12 


15: 


12 


16: 


12 


20: 


12 


21: 


12 


22: 


12 


23: 


12 


23: 


46 


23: 


56 


24: 


12 


25: 


12 


25: 


56 


26: 


12 


26: 


34 


27: 


12 


28: 


12 


30: 


12 


31: 


12 















25 

There are 23 hits at base# 12 
Plel gactc 25 

1: 12 3: 12 4: 12 6: 12 7: 12 8: 12 
9: 12 10: 12 11: 12 12: 12 15: 12 16: 12 
30 20: 12 21: 12 22: 12 23: 12 23: 56 24: 12 
25: 12 25: 56 26: 12 27: 12 28: 12 30: 12 
31: 12 

There are 23 hits at base# 12 



35 gagtc 
26: 34 



1 
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10 



15 



20 



Ddel Ctnag 
1: 14 2 
5: 24 

10: 14 

15: 14 

24: 14 

30: 14 

There are 



24 3: 

6: 14 7: 

11: 14 11: 

16: 14 16: 

25: 14 26: 
31: 14 

21 hits at 



14 
14 
24 
24 
14 



32 
3: 24 
7: 24 
12: 14 
19: 24 
27: 14 



4: 14 

8: 14 

12: 24 

20: 14 

28: 14 



4: 24 

9: 14 

15: 5 

23: 14 

29: 30 



base# 14 



BsaJI Ccnngg 
1: 23 1: 
4: 39 4: 
13: 23 13: 
17: 23 17: 
21: 47 
28: 39 
31: 23 
There are 



40 
40 
39 
39 



2: 
5: 
14: 
18: 
22: 
29: 



There are 
There sure 



22: 38 
29: 14 
31: 32 

17 hits at 
5 hits at 
5 hits at 



39 
39 
23 
23 
39 
39 



38 

2: 40 

11: 39 

14: 39 

18: 39 

22: 47 

30: 38 



3: 39 

12: 38 

15: 38 

21: 38 

26: 40 

30: 39 



3: 40 

12: 39 

16: 39 

21: 39 

27: 39 

30: 47 



base# 39 
base# 38 

base# 40 Makes cleavage ragged. 





Mnll 


cctc 






35 












1: 


23 


2: 23 3: 


23 


4: 


23 


5: 


23 


6: 


19 




6: 


23 


7: 19 8: 


23 


9: 


19 


9: 


23 


10: 


23 


25 


11: 


23 


13: 23 14: 


23 


16: 


23 


17: 


23 


18: 


23 




19: 


23 


20: 47 21: 


23 


21: 


29 


21: 


47 


22: 


23 




22: 


29 


22: 35 22: 


47 


23: 


26 


23: 


29 


24: 


27 




27: 


23 


28: 23 30: 


35 


30: 


47 


31: 


23 








There are 


21 hits at 


base# 


23 












30 


There are 


3 hits at 


base# 


19 














There are 


3 hits at 


base# 


29 














Thej 


re are 


1 hits at 


base# 


26 














There are 


1 hits at 


base* 


27 


Thes 


e could 


make 


cle« 






gagg 








7 










35 


1: 


48 


2: 48 3: 


48 


4: 


48 


27: 


44 


28: 


44 
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29: 44 



BssKI Nccngg 39 



1: 


40 


2: 


39 


3: 


39 


3: 


40 


4: 


39 


4: 


40 


5: 


39 


6: 


31 


6: 


39 


7: 


31 


7: 


39 


8: 


39 


9: 


31 


9: 


39 


10: 


39 


11: 


39 


12: 


38 


12: 


52 


13: 


39 


13: 


52 


14: 


52 


16: 


39 


16: 


52 


17: 


39 


17: 


52 


18: 


39 


18: 


52 


19: 


39 


19: 


52 


21: 


38 


22: 


38 


23: 


39 


24: 


39 


26: 


39 


27: 


39 


28: 


39 


29: 


14 


29: 


39 


30: 


38 















There are 21 hits at base# 39 
There are 4 hits at base# 38 
There are 3 hits at base# 31 
There are 3 hits at base* 40 Ragged 

15 

BstNI CCwgg 30 



1: 


41 


2: 


40 


5: 


40 


6: 


40 


7: 


40 


8: 


40 


9: 


40 


10: 


40 


11: 


40 


12: 


39 


12: 


53 


13: 


40 


13: 


53 


14: 


53 


16: 


40 


16: 


53 


17: 


40 


17: 


53 


18: 


40 


18: 


53 


19: 


53 


21: 


39 


22: 


39 


23: 


40 


24: 


40 


27: 


40 


28: 


40 


29: 


15 


29: 


40 


30: 


39 



There are 17 hits at base# 40 

There are 7 hits at base# 53 

There are 4 hits at base# 39 

25 There are 1 hits at base# 41 Ragged 



PspGI ccwgg 30 



1: 41 


2: 40 


5: 


40 


6: 


40 


7: 


40 


8: 


40 


9: 40 


10: 40 


11: 


40 


12: 


39 


12: 


53 


13: 


40 


13: 53 


14: 53 


16: 


40 


16: 


53 


17: 


40 


17: 


53 


18: 40 


18: 53 


19: 


53 


21: 


39 


22: 


39 


23: 


40 


24: 40 


27: 40 


28: 


40 


29: 


15 


29: 


40 


30: 


39 


There are 17 hits 


at 


base# 40 












There « 


are 7 hits 


at 


base# 53 












There < 


axe 4 hits 


at 


baf 


se# 39 
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There are 1 hits at basef 41 



ScrFI CCngg 39 



1: 


41 


2: 


40 


3: 


40 


3: 


41 


4: 


40 


4: 


41 


5: 


40 


6: 


32 


6: 


40 


7: 


32 


7: 


40 


8: 


40 


9: 


32 


9: 


40 


10: 


40 


11: 


40 


12: 


39 


12: 


53 


13: 


40 


13: 


53 


14: 


53 


16: 


40 


16: 


53 


17: 


40 


17: 


53 


18: 


40 


18: 


53 


19: 


40 


19: 


53 


21: 


39 


22: 


39 


23: 


40 


24: 


40 


26: 


40 


27: 


40 


28: 


40 


29: 


15 


29: 


40 


30: 


39 















There are 21 hits at base# 40 
There are 4 hits at base# 39 



There are 3 hits at base# 41 

15 Maelll gtnac 16 



1: 


52 


2: 


52 


3: 


52 


4: 


52 


5: 


52 


6: 


52 


7: 


52 


9: 


52 


26: 


52 


27: 


10 


27: 


52 


28: 


10 


28: 


52 


29: 


10 


29: 


52 


30: 


52 











There are 13 hits at base# 52 

20 

Tsp45I gtsac 15 



1: 


52 


2: 


52 


3: 


52 


4: 


52 


5: 


52 


6: 


52 


7: 


52 


9: 


52 


27: 


10 


27: 


52 


28: 


10 


28: 


52 


29: 


10 


29: 


52 


30: 


52 















25 There are 12 hits at base# 52 



HphI tcacc 26 



1: 


53 


2: 


53 


3: 


53 


4: 


53 


5: 


53 


6: 


53 


7: 


53 


8: 


53 


9: 


53 


10: 


53 


11: 


59 


13: 


59 


30 14: 


59 


17: 


59 


18: 


59 


19: 


59 


20: 


59 


21: 


59 


22: 


59 


23: 


59 


24: 


59 


25: 


59 


27: 


59 


28: 


59 


30: 


59 


31: 


59 



















There are 16 hits at base# 5 9 
There are 10 hits at base# 53 



35 
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BspMI ACCTGCNNNNn 14 
11: 61 13: 61 14: 61 17: 61 18: 61 19: 61 
20: 61 21: 61 22: 61 23: 61 24: 61 25: 61 
30: 61 31: 61 

There are 14 hits at base# 61 Goes into CDR1 
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Table 500: h3401-h2 captured Via CJ with BsmAI 

! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
!SAQDI QMTQSPATLS 
a GT GCA Ca a gac ate cag atg acc cag tct cca gee acc ctg tct 
5 ! ApaLI ... a gec acc ! 

L25,L6,L20,L2,L16,A11 

! Extender Bridge . . . 

! 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
70 ! V S P GE RAT L S C RAS Q 
gtg tct cca ggg gaa agg gec acc etc tec tgc agg gee agt cag 

! 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 
ISVSNNLAWYQQKPGQ 
15 agt gtt agt aac aac tta gee tgg tac cag cag aaa cct ggc cag 

• 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 
!VPRLL I YGASTRATD 

gtt ccc agg etc etc ate tat ggt gca tec acc agg gec act gat 

20 

! 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 

!I PARFSGS GSGTDFT 

ate cca gee agg ttc agt ggc agt ggg tct ggg aca gac ttc act 

25 ! 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
! L T I SRLEPEDFAVYY 
etc acc ate age aga ctg gag cct gaa gat ttt gca gtg tat tac 

! 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
30 ! C Q R Y G S S PGWT FGQG 
tgt cag egg tat ggt age tea ccg ggg tgg acg ttc ggc caa ggg 

! 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
•TKVE I KRTVAAP SVF 
35 acc aag gtg gaa ate aaa cga act gtg get gca cca tct gtc ttc 

! 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 
!I FPPSDEQLKSGTAS 
ate ttc ccg cca tct gat gag cag ttg aaa tct gga act gee tct 



40 



136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 

VVC LLNNFYPREAKV 
gtt gtg tgc ctg ctg aat aac ttc tat ccc aga gag gee aaa gta 
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20 



30 



35 



1 


151 


152 


153 


154 


155 


156 


157 


158 


159 


160 


161 


162 


163 


164 


165 


! 


Q 


W 


K 


V 


D 


N 


A 


L 


Q 


S 


G 


N 


S 


Q 


E 




cag 


tgg 


aag 


gtg 


gat 


aac 


gee 


etc 


caa 


teg 


ggt 


aac 


tec 


cag 


gag 


I 


166 


167 


168 


169 


170 


171 


172 


173 


174 


175 


176 


177 


178 


179 


180 


1 


s 


V 


T 


E 


Q 


D 


S 


K 


D 


S 


T 


Y 


S 


L 


S 




agt 


gtc 


aca 


gag 


cag 


gac 


age 


aag 


gac 


age 


acc 


tac 


age 


etc 


age 


1 


181 


182 


183 


184 


185 


186 


187 


188 


189 


190 


191 


192' 


193 


194 


195 


1 


S 


T 


L 


T 


L 


S 


K 


A 


D 


Y 


E 


K 


H 


K 


V 




age 


acc 


ctg 


acg 


ctg 


age 


aaa 


gca 


gac 


tac 


gag 


aaa 


cac 


aaa 


gtc 


I 


196 


197 


198 


199 


200 


201 


202 


203 


204 


205 


206 


207 


208 


209 


210 


i 


Y 


A 


C 


E 


V 


T 


H 


Q 


G 


L 


s 


S 


P 


V 


T 




tac 


gec 


tgc 


gaa 


gtc 


acc 


cat 


cag 


ggc 


ctg 


age 


teg 


cct 


gtc 


aca 


i 


211 


212 


213 


214 


215 


216 


217 


218 


219 


220 


221 


222 


223 






i 


K 


S 


F 


N 


K 


G 


E 


C 


K 


G 


E 


F 


A 








aag 


age 


ttc 


aac 


aaa 


gga 


gag 


tgt 


aag 


ggc 


gaa 


ttc 


gc. . 






Table 


501: 


h3401-d8 KAPPA captured with CJ and SsmAI 






i 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


! 


s 


A 


Q 


D 


I 


Q 


M 


T 


Q 


S 


P 


A 


T 


L 


S 




aGT 


GCA 


Caa 


gac 


ate 


cag atg 


acc 


cag 


tct 


cct 


gee 


acc 


ctg 


LCI 


j 


ApaLI . . 
















gec 


acc 


! 




L25,L6,L20,L2, 


L16, 


r All 






















i 






















A 


GCC 


ACC 


CTG 


TCT 


i 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


i 


V 


S 


P 


G 


E 


R 


A 


T 


L 


S 


C 


R 


A 


S 


Q 




gtg 


tct 


cca 


ggt 


gaa 


aga 


gee 


acc 


etc 


tec 


tgc 


agg 


gee 


agt 


cag 


i 


GTG 


TCT 


CCA 


GGG 


GAA 


AGA 


GCC 


ACC 


CTC 


TCC 


TGC 


i 


L2 




i 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


41 


42 


43 


44 


45 


i 


N 


L 


L 


S 


N 


L 


A 


W 


Y 


Q 


Q 


K 


P 


G 


Q 




aat 


ctt 


etc 


age 


aac 


tta 


gee 


tgg 


tac 


cag 


cag 


aaa 


cct 


ggc 


cag 


i 


46 


47 


48 


49 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


; 


A 


P 


R 


L 


L 


I 


y 


G 


A 


S 


T 


G 


A 


I 


G 




get 


ccc 


agg 


etc 


etc 


ate 


tat 


ggt 


get 


tec 


acc 


ggg 


gee 


att 


ggt 



! 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
¥5!IPARFSGSGSGTEFT 
ate cca gec agg ttc agt ggc agt ggg tct ggg aca gag ttc act 
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t 


76 


77 


78 


79 


80 


81 


82 


83 


84 


85 


86 


87 


88 


89 


90 




j 


L 


T 


I 


S 


S 


L 


Q 


S 


E 


D 


F 


A 


V 


Y 


F 






etc 


acc 


ate 


age 


age 


ctg 


cag 


tct 


gaa 


gat 


J_ 4_ 4_ 
ttt 


gca 


gtg 


tat 


ttc 


5 


! 


91 


92 


93 


94 


95 


96 


97 


98 


99 


100 


101 


102 


103 


104 


105 




J 


C 


Q 


Q 


Y 


G 


T 


S 


P 


P 


T 


F 


G 


G 


G 


T 






tgt 


cag 


cag 


tat 


ggt 


acc 


tea 


ecg 


ccc 


act 


ttc 


ggc 


gga 


ggg 


acc 




J 


106 


107 


108 


109 


110 


111 


112 


113 


114 


115 


116 


117 


118 


119 


120 


10 


J 


K 


V 


E 


I 


K 


R 


T 


V 


A 


A 


P 


S 


V 


F 


I 






aag 


gtg 


gag 


ate 


aaa 


cga 


act 


gtg 


get 


gca 


cca 


tct 


gtc 


4- 4- — 

ttc 


ate 




J 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 


131 


132 


133 


134 


135 




J 


F 


P 


P 


S 


D 


E 


Q 


L 


K 


S 


G 


T 


A 


S 


V 


15 




ttc 


ccg 


cca 


tct 


gat 


gag 


cag 


ttg 


aaa 


tct 


gga 


act 


gee 


tct 


JUJ l_ J_ 

gtt 




1 


136 


137 


138 


139 


140 


141 


142 


143 


144 


145 


146 


147 


148 


149 


150 




I 


V 


C 


p 


L 


N 


N 


F 


Y 


P 


R 


E 


A 


K 


V 


Q 


20 




gtg 


tgc 


ccg 


ctg 


aat 


aac 


ttc 


tat 


ccc 


aga 


gag 


gee 


aaa 


gta 


cag 




J 


151 


152 


153 


154 


155 


156 


157 


158 


159 


160 


161 


162 


163 


164 


165 




J 


W 


K 


V 


D 


N 


A 


L 


Q 


S 


G 


N 


S 


Q 


E 


S 






tgg 


aag 


gtg 


gat 


aac 


gec 


etc 


caa 


teg 


ggt 


aac 


tec 


cag 


gag 


agt 


25 


I 


166 


167 


168 


169 


170 


171 


172 


173 


174 


175 


176 


177 


178 


179 


180 




t 


V 


T 


E 


Q 


D 


N 


K 


D 


S 


T 


Y 


S 


L 


S 


S 






gtc 


aca 


gag 


cag 


gac 


aac 


aag 


gac 


age 


acc 


tac 


age 


etc 


age 


age 




I 


181 


182 


183 


184 


185 


186 


187 


188 


189 


190 


191 


192 


193 


194 


195 


30 


J 


T 


L 


T 


L 


S 


K 


V 


D 


Y 


E 


K 


H 


E 


V 


Y 






acc 


ctg 


acg 


ctg 


age 


aaa 


gta 


gac 


tac 


gag 


aaa 


cac 


gaa 


gtc 


tac 




t 


196 


197 


198 


199 


200 


201 


202 


203 


204 


205 


206 


207 


208 


209 


210 




1 


A 


C 


E 


V 


T 


H 


Q 


G 


L 


S 


S 


P 


V 


T 


K 


35 




gec 


tgc 


gaa 


gtc 


acc 


cat 


cag 


ggc 


ctt 


age 


teg 


ccc 


gtc 


acg 


aag 




1 


211 


212 


213 


214 


215 


216 


217 


218 


219 


220 


221 


222 


223 








1 


S 


F 


N 


R 


G 


E 


C 


K 


K 


E 


F 


V 












age 


ttc 


aac 


agg 


gga 


gag 


tgt 


aag 


aaa 


gaa 


ttc 


gtt 


t 







40 



WO 01/79481 



106/132 



PCT/US01/12454 



0) 

u 
c 

O 
M (D 
Oi CO 



VO 



m 

CO 
M 
<U 

U 
4J 
(0 

E co 

01 



-a 



CP 


• 


■ 


• 


• 


to 


• 


- 


• 


• 


CP 


• 


• 


• 


• 


o 


• 


• 


• 


• 


tP 


• 


• 




• 


P 


■ 


• 




■ 


tr> 


• 


• 


• 


• 


-P 


• 


• 


• 


• 


O 


• 


• 


• 


■ 


01 


• 


• 


• 


a 


P 


• 


• 


• 




P 








• 


OJ 




# 






4J 


• 








CP 




Of 






P 










CP 


• 


- 


• 


to 


P 




u 


o 


o 


u 










CP 


. 


s 


B 


« 


CP 


tP 


CP 


CP 


Cj 


of 


03 


0J 


0$ 


Of 


CP 


CP 


CP 


CP 


O 


O 


o 


o 


u 


U 


CP 


CP 


CP 


<d 


CJ 


-P 


p 


p 


P 


P 


CP 


CP 


CP 


CP 


O 


4-» 


p 


p 


4-> 


P 


O 


o 


o 


U 


O 


Of 


0J 


Of 


Of 


nj 


■P 


p 


p 


P 


P 


P 


p 


p 


P 


P 


of 


<d 


0J 


01 


OJ 


4J 


p 


p 


P 


P 


CP 


CP 


01 


CP 




-P 


p 


p 


p 


P 


CP 


CP 


CP 


CP 


OJ 


p 


u 


u 


u 


o 


o 


o 


o 


u 


a 


CP 


CP 


CP 


CP 


D 


rH 


CM 


rH 


rH 


rH 


rH 


iH 


CM 




cn 


1-1 


1 

rH 


rH 


1 

rH 


1 

rH 


oo 


OO 


OO 


OO 


OO 


ao 


00 


CD 


OO 


ao 


VHS 


VHS 


VHS 


VHS 


VHS 


o 


o 


rH 


o 


rH 


CM 


o 


cn 


o 


o 






r- 


CM 


CM 


r* 


m 


m 


CM 


CM 


vo 


CO 


o 


<Ti 


rH 


eg 


rH 


rH 




rH 


VO 


CO 


VO 


*r 


00 


r- 


<o 


rH 




rH 


r* 


o 


^* 


CO 


VO 


cn 


vo 


CO 




CO 


CM 


o 




o 


in 


m 


m 


rH 




CM 


tH 


i-i 










m 


VO 


o 


m 


VO 


vo 


<Ti 


CM 


cn 


CO 


CM 








tH 


CM 


CO 




m 



CM O 
OO 



rH CO 
rH CO 



cn r* 

rH CM 
00 

rH 00 
CM O 

ao 

cn 

vo ao 
r- 

r» oo 

^ rH 



O rH 

ro r- 
cm m 



m 
cn 



CO CO CO CO CO 

I I I I I 
b* tP jp p* o* 




0<H H H H h 
Of O 0"* CP CP 0J 

© _ _ o 




CO 
CA 

CM 

cn 

rH C 

cn o 

•*H 

O 4J 

cn -h 

cn CP 

co o 
U 

co a) 

oo CC 



CP CP CP CP o 
of of of of of 

cp cp cp cp b 

0 u o o u 

CP CP CP Of D -H 

4J P P P P 

CP CP CP D 

1 I I I 
4J p P P P 
O O O O O 
fij rd of oJ oJ 

«P P 4J P 



.p p p p 4J 

Of Of Of Of Of 

4J P P P P 

CP CP of CP t? 

p p p p j_> 

CP CP CP CP of 

p u u u o 

0 o u u o 

CP CP CP CP o 

1 I I 1 I 
m m m io io 



rH CM rH rH rH 



H H M 
I I I 



cn 
i 



OO 00 OO 00 oo 

OO OO OO 00 oo 

CO CO CO CO CO 

tg t£ S- eg 2g 




CN 



3 



WO 01/79481 



107/132 



PCT/US01/12454 



o 

CO 

1 

CM 



2 



eg 

r-l 

in 

o 
.-I 





u 


• 








o 


• 








4J 




flj 






U 




• 






U 




CP 






•P 


o 


CP cp 




<0 










U 










O 










4-> 










O 








B 


-P 










CP 








o 


(0 




• 




fa 


U 




• 






U 


4-1 


cp 


CP 


-P 


u 










OS 








s 


CP 










o 


O 


u 


u 




u 


o 


u 


u 








(It 






t{ 


ts 


u 


u 




o 


4-> 


CP 


o 




-p 


O 


tn 


CP 




flj 


A3 




nJ 




o 


O 


O 


u 




u 


O 


u 


o 




4-> 


4J 


4J 


■p 




u 


O 


O 


o 


<u 




4J 


-P 


4J 


o 


cp 


CP 


CP 


CP 


fi 




nJ 


oi 


03 


<u 


O 


o 


u 


U 




o 


-p 


CP 


CP 


D 


o 


o 


o 


U 


<I) 


(d 


OJ 




(T5 


CO 


CP 


CP 


CP 


CP 



oj r- h 



8. 

<Tj US 

S CO to CO CO 



O rH O O 

o o o o 

M H O O 

(*) H CM o o 



O U) H H 
CM 



H CO 00 00 
CM 



o o\ r* i-i 

^ H H <N 



CM VD O 
CO CO CM 



rH CM CO 



CM 



O rH 
GO 



CO rH 
CO 



CO CO 



cd m 
cm r- 



o r- 




m m 



01 

M 
<D 
4J 

a 

•s 

to 



o 
I 

o 

CO 
CM 
rH 

N 

CO 



in m 



m m 



8 



? 

o 

CO 
CM 
rH 

m 

N 
CO 



? 

O 
ro 
CM 

rH 

£ 

N 
CO 



m m 



i 

o 

CO 
CM 

N 

CO 



<3> 



3 



3 



WO 01/79481 



108/132 



PCT/US01/12454 



2 

TO 
M 

P 



CTJ 
M 
4J 
03 













CO 




CO 


CO 


CO 


1 






I 


i 


u 




u 


o 


u 




ns 








o 


<u 


o 


u 


o 


P 


rH 


p 




CTS 


O 


O 


u 


u 


o 


U 


<w 


P 


Cn 


o 


4-) 


o 


o 


Cn 


Cn 


1 

(d 


<u 


1 

id 


I 

nl 


1 




-P 








u 


*H 


u 


O 


O 


u 


CO 


o 


u 


o 


p 




-p 


p 


p 


u 




o 


o 


o 






P 


p 


p 


ft 




Cn 


Cn 


Cn 






nj 


(0 


flj 


O 




O 


U 


O 


u 




■P 


Cn 


Cn 


u 




u 


U 


L> 






aJ 


ni 


(0 


cn 




Cn 
1 


Cn 
1 


Cn 
1 












In 




m 


in 


m 



I 

a) 
in 
c 
a) 



CO 

i 

u 
< 
u 



to 
a 



(0 
U 0 
0 {/) 
I 

in 



0) 


0) 


0 


0 


c 


C 


C 


c 


0 


0 


0 


0 


0) 
■H 


to 


•s 


(0 

■H 


P 


(t, 

Mh 


r. 

Hn 


P 

ft, 
Sh 


0 


0 


O 


r\ 
U 


VJ 




r \ 


r \ 
{J 


<v 


rV 


iv 


iV 

MH 


•H 


0) 
■H 


(/) 
rl 


(0 
■H 


0) 




0) 


a; 


> 


> 


> 


> 


0 


0 


0 


0 


JO 


.0 




.0 


CO 


(0 


cO 


(0 










§ 


§ 


B 


§ 


CO 


CO 


CO 


CO 


u 


o 


1 

O 


I 

o 


o 


u 


u 


o 


p 


p 


nj 


as 


u 


o 


U 


u 


o 


p 


Cn 


u 


p 


o 


Cn 


Cn 





b < „ , 

Cn <? cn< 

%%%% 

£-* H Eh H 
o p o p 



Cn 
Cn 



Cn U Cn U 00 
i i I i 



/-n lOLnininioininm 



CO 

c 
<u 
a 
a 

nj 

x: 

p 
m 

J3 





* 


* 


+ 


CM 






rH 


rH 






rH 


o 


? 


? 


f 


1 

o 


o 


o 


o 


CO 


CO 


CO 


CO 


CM 


CM 


CM 


CM 




rH 


rH 


rH 


to 


03 


(Q 










* 


M 


N 


N 


N 


CO 


CO 


CO 


CO 



ZD 

! 



a a o a 

Oh «— ' »— ' Dh »-»Cc1 »— • 
D D O 3 



O 
PQ 




WO 01/79481 



109/132 



PCT/US01/12454 



o . 

m Cn I 

(d • 

o • 

O 4J 

•p o • 

~ <d • 

Cn • 



<o <d 



cn 



tn Cn 



O O O 

O O O 

o o u 

4J Cn U 

U Cn Cn 

<d nj m 

o o o 

o o o 

-U 4-> 4J 

u o o 

•U -M 4J 

Cn Cn Cn 

td <d (d 

u o o 

4J Cn Cn 

O O O 

rfl nj td 

Cn Cn Cn 



CM 






rH 


i — i 








O 




3 




eg 








i-i 


rH 


rH 




US 








cn 


CO 


CO 


CO 







O 


rH 


o 


O 


rH 


CM 
















CO 
















rH 




m 


O 


o 


o 


o 


O 


rH 
















00 
















rH 




^j* 


CM 


rH 


o 


o 


CO 


rH 
















CO 
















rH 


O 
















CO 


co 


rH 


CM 


o 


o 


CO 


CO 


1 














r- 


CM 














rH 


cH 


















CM 


O 


KD 


rH 


rH 


CO 


m 


0} 




CM 








CM 


r- 


01 














rH 


ra 
















id 


rH 


rH 


CO 


CO 


00 


o 




ja 




CM 






rH 


m 


















rH 


*» 
















td 


o 


O 






rH 


r* 


r- 








rH 


rH 


CM 


cn 




Ka 


















4J 




CM 


\0 


O 


CM 






O 


CO 


CO 


CM 




CO 




CM 


4J 










rH 




rH 


is 














tO 
















a> 


Q 


rH 


CM 


CO 










H 














•s 


































CM 

10 rH 

M O 

<D I 

4-> O 

a co 

jd CM 

"O rH 



N 
CO 



in m 



m m 



? 

o 

CO 
CM 
rH 

8 

N 

CO 



? 

o 

CO 
CM 
rH 

s 

N 
CO 



3 



f 

o 

CO 
CM 
rH 

§ 

N 

CO 



3 



WO 01/79481 



110/132 



0) 

c 

0 



c 

0 



0) 

c 

0 



% % % 

0 0 0 



PCT/US01/12454 



c 

0 



(0 CO (0 to 
-H tH -H -H 

55 55 



0 



■§ 

M 



ft 
& 

4J 



(0 

a 

& 

cd 



<u 

(TJ 
M 
4J 

w 

•9 





a 








CO 


cn 


CO 


CO 


co 


1 

u 


? 


1 

o 


i 


1 

o 




cd 








o 


a) 


0 


u 


o 






p 


cd 


<d 


o 


U 


u 


o 


o 


a 


4-1 


P 


cn 


o 


u 


O 


O 


cn 




i 

03 


(U 


1 

to 


cd 


i 

rd 




-P 








U 


•H 


o 


o 


o 


o 


CO 


o 


o 


o 


-p 




p 


-P 


p 


o 




o 


o 


o 


-P 




P 




4-> 


Cn 




Cn 


Cn 


Cn 


cd 




Cd 


cd 


(d 


O 




u 


U 


o 


O 




P 


cn 


Cn 


O 




U 


O 


O 


rd 




m 




cd 


CP 

1 




cn 
i 


Cn 
1 


Cn 












m 




in 


m 


In 



a — 



•* 

CM 
r-i 
O 
I 

o 

CO 
CM 



? 9 

o 

CO 
CM 



o 

CO 
CM 



§ s s 



0) 

8 

0) 
(0 



CO 

I 

s 

< 

0 0 

u go 
In 



t — i 



N 

CO 



N 
CO 



N 
CO 



iH ClJ 

* S 

i 1 

rH Q) 

§ 1 

CO -* 



CO 



< 

0 

J* 

OC/) 
1 

10 



OS 

a 



i 

a) 

5 



•H 

i 



u 

(0 
rl 

1 

(0 



a 

OS 

« 

rt 

> 
0 

(0 



CO 

rl 

a) 
> 

CO 




H U H 0 

Cn 00 Cn 00 

B» 0 *§> 0 

Cn 0 0> U 

• ■ I I 



lOLoioinmioioin 

i— i i— i i— i i— i 

o y u a 

CM CO ^* 

9 9 9 9 

0$ 0^ 04 
PQ PQ eg CQ 

3 111 



3 



3 



WO 01/79481 



111/132 



PCT/US01/12454 



B 

o 

■P 

o 



a) 

P 

§ CO 



o 



a> 



p 
o 

P 



O • • 

4J . . . 

flJ D> ^ 

tn flj <U 



o 

4J fd 

rd • 

o • 

(0 • 

o» ■ 

KJ% . 

P> tn 

U -P 

O -P 

4J O 

a • 

4J tn 

tn • 



tn 



tn 



o o o 

4-> P -P 

on cn on 

u u cn 
flfl 4J fd 
tn tn Cn 

10 <Q fl 
O O 

tn tn 
tn tn 
4J 
4J U 
•P U 
U 4J 
O O 
tn -p 

& tn 



o 
tn 
tn 
tn 

8 

U 

u 
u 
tn 
tn 



eg 

<d i— i o o 

CM CO CM H 

I I I I 

CO CO CO CO 

CO CO CO CO 

rH rH i-l rH 

£222 



rH CM O CN 

cm o m ^ 

CM H H C* 

O H H CO 

o o ^ 

O i-H O ^ 

rH O O O 
rH 

rH O O 

if) O VD CO 



oo vd r- r- 

U") i— I rH CO 



rH CO 
rH CM 



UO rH 

o 



CO VD 



in co 

GO 



rH CO 
rH 00 



rH CM CO 'SJ* 




CM 

aj 

CM 
I 

CO 
CO 



CO 

I 

CO 
CO 




tn tn < o 
O H < 
o o 

a) < p» • 
oC tn E-» - 



.* 

6 tn 
<u o 

co u 

• E-t 
Q* Eh 



tn 
E-» 
Eh 

tn 
Eh 
tn 
O 
O 



O - 

P . 

tn . 

m • 

o . 

p • 

& . 

<d • 
o a 
<d o 
o» -H 
tn 4J 

P -H 

o a 

u tn 

•u o 

o o 



m m 



8 



U 
CM 



I 

CO 

ro 



S3 



WO 01/79481 



112/132 



PCT/US01/12454 



What happens in the top strand: 



10 



(VL133-2a2*) 5'-g tct cct gjga cag"~tcg~atcT 

(VL133-31*) 5*-g gcc ttg g | ga cag aca gtc 

(VL133-2c*) 5*-g tct cct glga cag tea gtc 

(VL133-lc*) 5'-g gcc cca glgg cag agg gtc 



The following Extenders and Bridges all encode the AA sequence of 2a2 
codons 1-15 

1 

15 (ON_LamExl33) 5 1 -ccTcTgAcTgAgT gcA cAg - 



for 



20 ! 



2 3 4 5 6 7 8 9 10 11 12 

AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 

13 14 15 

tcC ccG g ! 2a2 



(ON_LamBl-133) [RC] 5 1 -ccTcTgAcTgAgT gcA cAg - 



25 ! 



30 



2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 




(ON_LamB2-133) [RC] 5 ' -CcTcTgAcTgAgT gcA cAg - 



! 



35 



40 MM 



2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 




45 



50 



55 



(ON_LamB3-133) [RC] 5 * -CcTcTgAcTgAgT gcA cAg - 



2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 




<ON_LamB4-133) [RC] 5 1 -CcTcTgAcTgAgT gcA cAg - 
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2 3 4 5 6 7 8 9 10 11 12 
AGt gcT TtA acC caA ccG gcT AGT gtT AGC ggT- 




(ON_Laml33PCR) 5 » -ccTcTgAcTgAgT gcA cAg AGt gc-3 f 
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